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X,  P  the  joint  state  estimate  of  all  targets,  and  the  covariance  of 

the  joint  state  estimate 

Nm(k)  the  number  of  measurements  in  the  Zc-th  set  (i.e.,  the  k-th 

scan) 

Nt  the  number  of  targets  under  track 

9ji(k)  a  single  measurement  association  event,  indicating  the  associ¬ 

ation  of  measurement  j  with  target  i  at  sample  k 

Qi(k)  the  Z-th  joint  association  event  for  measurement  set  k ,  con¬ 

taining  a  single  measurement  event  for  each  of  the  Nm(k ) 
measurements  received  in  the  Zc-th  scan 

xi’u(k)  the  w-th  association  history  event,  which  contains  a  joint  as¬ 

sociation  event  for  each  scan  from  1  to  k 
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Notation 


Usage 


Nh{k) 

^ Nh(k ) 

Nr(k) 

^Nr(k) 


the  number  of  association  hypotheses  in  the  tracking  system 

after  incorporation  of  the  k-th  set  of  measurements 

the  full  parameters  (weights,  means,  covariances)  of  the  N \ 

association  hypotheses  after  incorporation  of  the  k-th  set  of 

measurements 

the  number  of  association  hypotheses  at  the  end  of  the  k-th 
processing  cycle,  after  hypothesis  reduction  has  been  applied 
the  parameters  of  the  reduced  set  of  Nr  hypotheses 
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Abstract 

The  problem  of  tracking  multiple  maneuvering  targets  in  clutter  naturally  leads 
to  a  Gaussian  mixture  representation  of  the  Probability  Density  Function  (PDF) 
of  the  target  state  vector.  State-of-the-art  Multiple  Hypothesis  Tracking  (MHT) 
techniques  maintain  the  mean,  covariance  and  probability  weight  corresponding  to 
each  hypothesis,  yet  they  rely  on  ad  hoc  merging  and  pruning  rules  to  control  the 
growth  of  hypotheses.  This  thesis  investigates  the  performance  benefit  achievable 
by  applying  a  structured  cost  function-based  approach  to  the  hypothesis  control 
problem. 

A  new  cost  function,  the  Integral  Square  Difference  (ISD)  cost,  is  proposed 
for  measuring  the  difference  between  the  full  target  state  PDF  and  a  reduced-order 
approximation.  The  ISD  cost  function  is  physically  meaningful,  and,  unlike  any  pre¬ 
viously  proposed  cost  function,  it  is  also  mathematically  tractable,  requiring  neither 
numerical  integration  nor  approximation  for  evaluation.  A  reduction  algorithm  is 
proposed  which  selects  components  for  merging  or  pruning  to  minimize  the  increase 
in  the  ISD  cost.  This  solution  is  used  directly,  and  also  as  the  starting  point  for  an 
iterative  gradient-based  optimization. 

The  performance  of  the  ISD-based  algorithm  for  tracking  a  single  target  in 
heavy  clutter  is  compared  to  that  of  Salmond’s  joining  filter,  which  previously  had 
provided  the  highest  performance  in  the  scenario  examined.  For  a  large  number  of 
mixture  components,  it  is  shown  that  the  ISD  algorithm  outperforms  the  joining 
filter  remarkably,  yielding  an  average  track  life  more  than  double  that  achievable 
using  the  joining  filter.  The  results  indicate  that  the  tracking  performance  of  the 
ISD-based  filter  in  heavy  clutter  is  significantly  higher  than  achievable  using  any 
previously  published  algorithm. 
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GAUSSIAN  MIXTURE  REDUCTION 
FOR  TRACKING  MULTIPLE  MANEUVERING  TARGETS 

IN  CLUTTER 


I.  Introduction 


In  their  early  inception,  radar  systems  were  able  to  track  a  single  target  in  a 
clutter- free  environment.  The  limited  surveillance  capability  essentially  presented 
the  operator  with  a  raw  display  of  measurements,  leaving  it  up  to  the  human  to 
interpret  the  display  and  infer  information  such  as  velocity.  Early  tracking  radars 
illuminated  the  target  continually  to  ensure  that  knowledge  of  the  target  position 
did  not  deteriorate,  causing  loss  of  track.  This  prevented  the  radar  from  performing 
any  other  tasks  at  the  same  time,  such  as  tracking  multiple  targets  or  maintaining 
surveillance  capability  during  track. 


The  vast  body  of  theory  of  stochastic  estimation  developed  since  the  1960s  has 
enabled  revolutionary  changes  to  the  design  and  capability  of  radar  systems.  Modern 
radar  systems,  such  as  those  utilizing  Electronically  Scanned  Array  (ESA)  antenna 
technology  [51  [71  [51],  can  perform  multiple  functions  at  once,  simultaneously  pro¬ 
viding  high  quality  tracking  estimates  for  some  targets  while  maintaining  wide-area 
surveillance  of  the  entire  operational  theater.  Virtually  every  modern  surveillance 
radar  operates  in  Track  While  Scan  (TWS)  mode,  in  which  the  radar  continually 
searches  for  existence  of  new  targets,  but  once  detected,  targets  are  also  tracked 
using  data  filtering  techniques  [S3]. 


The  essential  function  of  a  modern  radar  system  is  to  maintain  as  much  knowl¬ 
edge  of  the  target  state!1  as  possible.  The  success  of  modern  tracking  techniques  is 


1i.e.,  position,  velocity,  acceleration,  etc. 
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+  Measurements 
•  Targets 

+ 

+ 

Figure  1.1.  The  difficulty  of  data  association:  the  origin  of  each  mea¬ 
surement  is  not  known,  hence  the  system  does  not  know 
which  measurement  belongs  to  which  target,  and  which 
measurements  are  false  alarms  (due  to  radar  clutter). 

largely  determined  by  their  ability  to  compute  and  store  the  Probability  Density 
Function  (PDF)  of  target  state  in  an  efficient  manner.  This  study  cuts  to  the  core  of 
this  problem:  how  can  the  PDF  of  the  target  state  vector  be  reduced  in  complexity 
such  that  the  system  remains  computationally  tractable,  while  causing  the  smallest 
possible  change  in  the  overall  structure  of  the  PDF. 

1 . 1  Motivation 

In  order  to  maintain  knowledge  of  a  target’s  kinematic  state,  a  radar  system 
must  be  able  to  update  its  target  state  model  using  the  radar  detections  produced 
during  each  scan  interval.  Data  association  algorithms  are  the  tools  utilized  to 
perform  this  update.  The  difficulty  in  such  algorithms  is  illustrated  in  Figure  1L11 
the  system  is  provided  with  a  set  of  detections,  each  of  which  indicates  the  possible 
presence  of  a  target.  However,  the  system  does  not  know  which  measurement  belongs 
to  which  target,  or  which  measurements  are  actually  false  alarms  (the  result  of  radar 
clutter),  hence  the  best  way  to  update  the  state  estimates  for  each  target  using  the 
measurements  is  unclear. 

A  Gaussian  mixture,  consisting  of  a  weighted  sum  of  Gaussian  PDFs,  each 
with  different  means  and  covariances,  is  the  natural  form  of  the  PDF  of  target 
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Gaussian  Mixture  PDF 


Figure  1.2.  An  example  of  a  Gaussian  mixture:  the  individual 
weighted  Gaussian  component  PDFs  are  shown  using 
dashed  lines;  the  overall  PDF  (the  sum  of  the  compo¬ 
nents)  is  shown  using  a  solid  line. 


state  in  this  problem.  Using  such  a  structure,  a  mixture  component  is  created  for 
every  possible  association  event  J2]  with  the  mean  and  covariance  calculated  assuming 
that  the  particular  hypothesis  is  true,  and  the  weight  calculated  to  represent  the 
probability  that  the  particular  hypothesis  is  true.  An  example  of  a  Gaussian  mixture 
is  shown  in  figure  1.2,  with  the  individual  weighted  component  PDFs  shown  using 
dashed  lines,  and  the  overall  PDF  (the  sum  of  the  components)  shown  using  a  solid 
line. 


The  difficulty  of  data  association  is  that  every  association  hypothesis  from  the 
previous  processing  cycle  must  be  paired  with  every  association  event  from  the  cur¬ 
rent  set  of  measurements,  and  a  new  association  hypothesis  must  be  created  for  each 
pairing.  For  example,  if  a  system  commences  with  a  single  association  hypothesis, 
and  the  first  set  of  measurements  received  gives  rise  to  20  possible  association  events, 


2i.e.,  every  possible  pairing  of  targets  and  measurements. 
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Original  Density  4-Component  Approximation 


3-Component  Approximation 


2-Component  Approximation 


Figure  1.3.  Approximating  a  Gaussian  mixture  using  fewer  mixture 
components.  The  original  mixture  is  shown  in  the  top 
left  figure,  alongside  approximations  using  four  compo¬ 
nents  (top  right),  three  components  (bottom  left)  and 
two  components  (bottom  right). 


then  there  will  be  20  association  hypotheses.  If  the  following  set  of  measurements 
produces  30  possible  association  events,  then  the  number  of  hypotheses  will  increase 
to  600:  one  for  each  pairing  of  previous  hypothesis  and  new  association  event.  Each 
hypothesis  will  require  a  corresponding  Gaussian  mixture  component,  each  with  a 
different  mean,  covariance  and  probability  weight;  and  the  number  of  components 
will  grow  exponentially  with  time.  It  is  therefore  necessary  to  employ  methods  of 
reducing  the  complexity  of  the  mixture  while  maintaining  its  overall  form  as  well  as 
possible.  Typically  many  of  the  hypotheses  are  very  similar,  or  contribute  a  very 
small  probability  weight,  hence  it  is  possible  to  reduce  the  number  of  mixture  com¬ 
ponents  without  modifying  the  PDF  structure  significantly.  This  is  illustrated  in 
Figure  11.31  which  shows  optimized  approximations  of  the  PDF  of  Figure  13!  (which 
has  five  mixture  components)  using  four,  three  and  two  mixture  components. 
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The  Multiple  Hypothesis  Tracker  (MHT)  (4J334-340,  23283-300,  [71360-369, 
140]  is  the  state-of-the-art  tracking  algorithm  for  modern  civilian  and  military  radar 
systems.  The  algorithm  directly  maintains  the  Gaussian  mixture  representation  of 
the  target  state  PDF,  retaining  multiple  association  hypotheses,  each  represented 
by  a  mixture  component,  with  a  probability  weight,  mean  vector  and  covariance 
matrix.  The  concept  of  the  MHT  is  to  provide  a  deferred  decision-making  structure, 
such  that  target-measurement  association  decisions,  which  are  uncertain  at  a  given 
processing  cycle,  can  be  made  at  a  later  time  after  further  information  has  been 
received  [8].  Although  the  correct  hypothesis  may  not  be  the  most  likely  at  a  given 
instant  in  time,  as  more  sets  of  measurements  are  received,  hypotheses  due  to  random 
clutter  will  become  less  likely,  making  the  correct  hypothesis  comparatively  more 
likely.  Since  the  number  of  hypotheses  grows  exponentially  with  time,  any  practical 
implementation  must  apply  some  form  of  simplification  to  the  PDF,  most  commonly 
performed  by  merging  similar  hypotheses  together,  and  deleting  (pruning)  unlikely 
hypotheses.  The  effectiveness  of  the  deferred  decision-making  structure  is  completely 
dependent  on  whether  the  correct  hypothesis  remains  in  the  Gaussian  mixture  when 
clarifying  measurements  are  received,  a  function  which  is  determined  purely  by  the 
merging  and  pruning  logic. 

Few  mixture  reduction  algorithms  have  been  published  in  open  literature. 
Salmond  (44P47]  proposes  an  algorithm  which  combines  the  mixture  components 
which  are  closest  in  the  sense  of  a  given  ad  hoc  distance  measure.  Salmond  notes 
that  the  ideal  solution  would  be  to  search  for  the  solution  which  optimizes  a  mean¬ 
ingful  cost  function,  but  concludes  that  the  computational  expense  of  such  an  un¬ 
dertaking  would  be  problematic.  Fifteen  years  later,  the  simulations  once  performed 
on  a  Cray  IS  supercomputer  can  now  be  run  on  a  common  home  computer,  so  it 
would  seem  appropriate  to  challenge  such  assumptions. 

The  essence  of  this  study  is  to  reduce  a  Gaussian  mixture  model  from  a  larger 
number  of  components  to  a  smaller  number  of  components,  while  modifying  the  PDF 
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as  a  whole  as  little  as  possible,  as  illustrated  in  Figure  H731  Although  the  application 
of  interest  in  this  study  is  target  tracking,  the  algorithm  is  equally  relevant  to  any 
of  the  rich  class  of  research  areas  to  which  the  Gaussian  mixture  has  been  applied. 
This  includes  a  wide  variety  of  statistical  classification  problems  such  as  speech  and 
speaker  recognition  [141  EH],  lip  recognition  [141 155],  face  recognition  [14]  and  image 
segmentation  [Ml  [58] ,  to  name  a  few. 

1.2  Research  Goal 

The  goal  of  this  study  is  to  develop  techniques  of  maintaining  the  PDF  of  joint 
target  state  with  higher  fidelity  than  allowed  by  existing  methods.  The  major  focus 
of  the  research  is  to  concentrate  on  the  efficiency  of  the  representation  in  order  to 
provide  the  best  description  of  the  target  distribution  using  the  most  compact  set  of 
parameters  possible. 

Considering  this  focus  on  efficiency,  the  study  will  commence  by  examining 
the  difficulties  associated  with  algorithms  based  on  Probabilistic  Data  Association 
(PDA)  pi  3],  which  provide  an  extremely  compact  representation  of  the  target  state, 
retaining  only  a  single  Gaussian  component.  Subsequently,  methods  will  be  devel¬ 
oped  to  reduce  the  number  of  components  in  a  Gaussian  mixture  while  effecting  the 
smallest  change  possible  in  the  overall  PDF  structure.  In  this  way,  these  techniques 
will  provide  a  generalization  of  PDA  which  will  provide  the  best  representation  pos¬ 
sible  of  the  target  state  PDF  for  a  given  number  of  Gaussian  mixture  components. 

1.3  Assumptions 

The  assumptions  made  in  this  study  arise  out  of  the  probabilistic  model  of  joint 
association  events  presented  in  Section  [2.5.21  Firstly,  the  problem  of  target  existence 
is  not  addressed,  and  the  algorithms  developed  assume  knowledge  of  the  number  of 
targets  present.  Through  this  assumption,  we  concentrate  the  research  effort  on 
developing  the  best  possible  method  of  maintaining  knowledge  of  the  target  state 
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PDF;  target  existence  considerations  can  be  incorporated  later  as  presented  in  (42], 
Secondly,  the  measurement  model  assumes  that  each  measurement  belongs  to  one 
target  and  one  target  only,  hence  ignoring  the  possible  case  in  which  two  targets 
are  within  the  same  radar  resolution  cell  and  provide  a  single  merged  measurement. 
While  this  assumption  will  be  violated  in  situations  in  which  targets  are  extremely 
closely-spaced,  such  cases  can  be  handled  as  an  exception  (as  presented  in  [27]  128]), 
and  the  overall  solution  form  is  unchanged.  Finally,  the  measurement  model  assumes 
that  each  target  gives  rise  to  no  more  than  one  measurement,  hence  ignoring  the 
possible  case  in  which  a  large,  near  target  spans  multiple  resolution  cells  and  gives 
rise  to  several  measurements.  Again,  extensions  can  be  derived  to  handle  such  cases 
explicitly,  and  the  overall  form  of  the  solution  remains  unchanged. 

The  simulations  presented  in  Chapter  [[0  also  assume  a  linear  Cartesian  mea¬ 
surement  model  in  order  to  employ  the  standard  Kalman  filter,  as  presented  in 
Section  2.2.31  The  linear  measurement  model  was  selected  in  order  to  concentrate 
the  study  on  the  impact  of  data  association;  the  linear  model  could  be  replaced  by 
a  nonlinear  polar  measurement  model  (as  provided  by  conventional  radar  systems) 
by  replacing  the  Kalman  filters  in  the  structure  with  extended  Kalman  filters,  as 
discussed  in  Section  12.2.41  The  results  presented  in  [44]  utilize  a  linear  measurement 
model  for  similar  reasons. 

1.4  Thesis  Organization 

Chapter  II  examines  the  background  of  target  tracking,  comparing  and  con¬ 
trasting  current  techniques  and  highlighting  areas  of  possible  improvement.  Section 
12.2  reviews  the  basic  structures  of  single-target  non-maneuvering  tracking  filters, 
presenting  the  background  of  the  Kalman  filter.  Section  12.3  briefly  considers  the 
theory  and  practice  of  Gaussian  mixture  models,  which  are  utilized  throughout  the 
remainder  of  the  thesis.  Section  12.41  outlines  the  extensions  which  have  been  made 
to  the  Kalman  filter  to  address  the  issue  of  maneuvering  targets,  deriving  the  algo- 
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rithm  which  is  currently  considered  state-of-the-art,  the  Interacting  Multiple  Model 
(IMM)  estimator  [31461-465, 35].  Section  2.5  derives  in  detail  the  probabilistic  model 
of  joint  association  events,  and  then  uses  this  model  to  highlight  the  differences  and 
similarities  of  the  current  generation  of  data  association  algorithms.  Section  2.5.13 
briefly  outlines  some  of  the  methods  which  have  been  proposed  in  open  literature 
to  combine  the  maneuvering  techniques  of  Section  12.41  with  the  data  association 
techniques  of  Section  123T  to  aid  tracking  of  multiple  maneuvering  targets  in  clutter. 
Finally,  Section  5161  briefly  reviews  some  of  the  techniques  of  numerical  optimization 
which  will  be  utilized  in  Chapter  III1 

Chapter  III  commences  by  analyzing  some  of  the  difficulties  observed  with  the 
Joint  Probabilistic  Data  Association  (JPDA)  (2j  3]  and  Coupled  Probabilistic  Data 
Association  (CPDA)  [12]  algorithms,  providing  new  insight  into  the  target  bias  and 
track  coalescence  phenomena  examined  in  [121  316].  Subsequently,  an  algorithm  is 
developed  for  reducing  the  number  of  components  in  a  Gaussian  mixture,  starting  in 
Section  3.3.1  by  considering  possible  cost  measures  which  could  be  used  to  measure 
the  deviation  from  the  original  PDF  caused  by  the  reduction.  Section  3.3.21  analyzes 
our  chosen  cost  function,  the  Integral  Square  Difference  (ISD)  cost,  in  detail  before 
Section  3.3.3  applies  the  iterative  optimization  techniques  presented  in  Section  12.6 
to  find  the  set  of  parameters  for  the  reduced  PDF  which  minimizes  the  cost.  Section 
553  proposes  an  algorithm  which  can  be  used  to  initialize  the  iterative  optimization, 
a  function  which  is  very  important,  considering  the  multi-modal  structure  of  the  cost 
function. 

Chapter  [TV]  commences  by  testing  the  initialization  and  iterative  optimization 
techniques  on  a  simple  one-dimensional  problem,  providing  a  graphical  demonstra¬ 
tion  of  their  operation.  Sections  14,41  to  14.51  then  present  results  of  computer  simu¬ 
lations  applying  the  algorithm  to  the  problems  of  tracking  a  single  target  in  clut¬ 
ter,  tracking  multiple  targets  in  clutter  and  tracking  a  single  maneuvering  target. 
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Chapter  [V]  concludes  by  summarizing  the  results  and  suggesting  areas  for  further 
investigation. 
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II.  Background 


2. 1  Introduction 

The  discipline  of  target  tracking  spans  a  wide  range  of  theory,  incorporating 
basic  state  estimation,  multiple  model  estimation  techniques,  and  data  association 
methods.  Almost  all  modern  tracking  systems  utilize  the  Kalman  filter  as  the  central 
tool  for  state  estimation;  this  method  is  described  in  Section  12.21  When  a  target 
is  maneuvering,  the  success  of  the  standard  Kalman  filter  can  be  limited  severely. 
Alternative  structures  using  several  Kalman  filters  in  parallel  have  proven  successful 
in  this  problem;  these  are  described  in  Section  12741  The  ambiguity  of  the  origin  of 
measurements  is  unavoidable  in  tracking  systems;  the  most  common  methodologies 
of  dealing  with  this  problem  are  discussed  in  Section  12151  Section  12131  briefly  outlines 
the  Gaussian  mixture,  which  is  the  statistical  model  that  arises  naturally  in  these 
problems,  and  Section  12.6  reviews  the  theory  of  iterative  optimization  which  will  be 
employed  in  Chapter  1X111 

2.2  Tracking  Filters 

2.2.1  Introduction.  The  following  sections  review  the  fundamental  tools  of 
target  tracking,  most  importantly,  the  Kalman  filter.  The  material  presented  is  not 
intended  to  be  a  complete  coverage  of  the  topic  area;  rather,  major  outcomes  are 
stated  and  pointers  are  given  to  useful  reference  material  that  describe  the  field  in 
more  detail. 

2.2.2  Ad  Hoc  Techniques.  The  concept  that  commenced  the  revolution  in 
surveillance  radar  performance  was  that  of  incorporating  dynamics  models  into  the 
tracking  system.  By  utilizing  the  equations  of  Newtonian  dynamics,  such  models 
make  it  possible  to  predict  the  future  location  of  the  target,  freeing  the  radar  to 
perform  other  tasks  between  updates.  The  dynamics  models  can  be  based  on  simple 
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constant  velocity  assumptions,  such  as: 


x(k)  =  x(k  -  1)  +  v(k  -  1)  •  [tk  -  4-i] 

v(k)  =  v(k  —  1)  (2.1) 

where  x  is  the  position  of  the  target,  v  is  the  velocity  and  [tk  —  tk- 1]  is  the  time 
difference  between  the  k-th  and  (k  —  l)-th  measurement  instants.  If  the  velocity  of 
the  target  is  not  well  modelled  as  constant,  then  constant  acceleration  models  can 
be  employed  such  as: 

x(k)  =  x(k  -  1)  +  v{k  -  1)  •  [tk  -  |  a(k  -  1)  •  [tk  -  4-i]2 

v(k)  =  v(k  -  1)  +  a(k  -  1)  •  [tk  -  4-i] 

a(k)  =  a(k  —  1)  (2.2) 

where  a  is  the  acceleration  of  the  target.  Early  tracking  methods  based  upon  these 
models  estimated  the  position,  velocity,  and  where  necessary,  acceleration,  of  the 
target  using  a  weighted  average  of  the  current  measurement  and  the  value  predicted 
using  Eq.  (2.1)  or  (2.2).  The  tracking  filter  based  on  the  constant  velocity  assumption 
is  referred  to  as  the  a- (3  tracker,  and  operates  as  follows  [501260]: 

x(k\k)  =  x(k\k  —  1)  +  a[z(k)  —  x(k\k  —  1)] 

v(k\k)  =  v(k\k  —  l)-\ - — - [z{k)  —  x{k\k  —  1)]  (2.3) 

tk  —  tk- 1 

where  z(k)  is  the  fc-th  measurement  of  target  position,  occurring  at  time  tk.  The 
notation  x(k\k  —  1)  is  used  to  represent  the  estimated  position  of  the  target  at  time 
tk,  predicted  using  Eq.  (2.1))  and  measurements  up  to  time  tk-i,  and  x(k\k)  repre¬ 
sents  the  estimated  position  of  the  target  after  incorporation  of  the  measurement 
z(k).  Similarly,  v(k\k  —  1)  is  the  estimated  velocity  before  incorporation  of  the  new 
measurement,  and  v(k\k)  is  the  estimated  velocity  after  incorporation  of  the  mea- 
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surement.  The  a- (3  tracker  receives  its  name  from  the  coefficients,  a  and  (3,  that 
are  used  as  weighting  factors  to  perform  the  updates.  If  a  and  /3  are  zero,  then  the 
system  relics  purely  on  the  predictions  provided  by  the  dynamics  model  embedded 
in  the  system.  Conversely,  if  a  and  (3  are  one,  then  the  system  ignores  the  system’s 
dynamics  model,  and  relies  purely  on  the  latest  measurement.  Thus,  by  adjusting 
a  and  /?,  the  designer  has  a  trade-off  between  the  weight  that  the  system  places  on 
the  past  measurements,  as  propagated  through  the  dynamics  model,  and  the  weight 
that  the  system  places  on  the  newly  introduced  measurement. 

The  a-(3- 7  tracker  operates  similarly,  incorporating  an  additional  weighting 
factor  to  aid  in  estimation  of  the  acceleration  (which  is  assumed  constant)  [6121]: 

x(k\k)  =  x  (k\k  —  1)  +  a(z(k)  —  x(k\k  —  1)] 

v(k\k)  =  v(k\k  —  1)  -I - — - [z(k)  —  x(k\k  —  1)] 

tk  ~  tk- 1 

a(k\k)  =  a(k\k  —  1)  +  7 - — - -r  [z{k)  —  x{k\k  —  1)]  (2.4) 

(tk  -  tk- 1)2 

These  equations  operate  as  per  the  a- (3  tracker  with  the  prediction  between  mea¬ 
surement  intervals  performed  using  the  constant  acceleration  model  of  Eq.  (2.2).  and 
7  representing  the  weighting  coefficient  used  to  update  the  acceleration  estimate  of 
the  model. 

The  a-f3  and  a- (3- 7  trackers  can  exhibit  very  good  performance,  provided  that 
the  weighting  coefficients  are  selected  carefully  [31288-289].  However,  rules  for  se¬ 
lecting  these  coefficients  are  largely  ad  hoc,  relying  more  on  trial  and  error  than  on 
mathematical  theory. 

2.2.3  Kalman  Filter.  The  Kalman  filter  is  the  tool  which  provides  a  math¬ 
ematical  basis  for  ad  hoc  methods  such  as  the  a- (3  and  a-(3- 7  trackers,  and  at  the 
same  time  gives  a  mechanism  for  calculating  the  optimal  values  of  the  weighting 
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coefficients.  The  filter  is  based  on  the  simple  linear  state  model: 1 

x(k)  =  &(k,  k  —  1  )x(k  —  1)  +  G d(k  —  1  )w(k  —  1) 

z(k)  =  H(k)x(k)  +  v(k)  (2.5) 

where  x(k)  is  the  true  state  of  the  system  at  sample  instant  k;  z(k )  is  the  noise 
corrupted  measurement  supplied  to  the  estimator  at  time  instant  k;  G,i  and  H 
are  known  system  matrices;  and  w(k)  and  v(k)  are  two  mutually  independent  white 
Gaussian  noise  processes  (also  independent  of  prior  knowledge  of  x)  such  that: 

E{w{k)w(l)}  =  Q d(k)  Ski 
E{v(k)v(l)}  =  R  (k)Skl 

where  Ski  is  the  Kronecker  delta  function  (unity  when  k  —  l,  zero  otherwise). 

If  the  prior  knowledge  of  x  indicates  that  it  follows  a  Gaussian  Probability 
Density  Function  (PDF)  with  mean  x(k  —  1| k  —  1)  and  covariance  P {k  —  1| k  —  1): 

f{x(k-l)\Zk~1}  =  |27rP(fc-l|fc-l)|-5  . 

•exp{  —  \[x(k  —  1)  —  x(k  —  1| k  —  1)]T  ■ 

•P  (k  —  1|  k  —  1  )_1[a;(A;  —  1)  —  x(k  —  1|  k  —  1)]} 

=  A f{x(k  —  1  );x(k  —  1|  k  —  1),  P(k  —  1|  k  —  1)} 

then  it  can  be  shown  that  the  PDF  of  the  state  of  x  propagated  forward  to  sample 
period  k  remains  Gaussian  [341208-209]: 

f{x(k)\Zk~1}  =  A f{x(k)- x(k\k  -  1), P(k\k  -  1)} 

1  Shown  in  discrete-time  form,  even  though  there  will  inevitably  be  an  underlying  continuous¬ 
time  system. 
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where  the  mean  x(k\k  —  1)  and  covariance  P(k\k  —  1)  are  described  by  the  Kalman 
filter  propagation  equations: 


x{k\k  —  l)  =  <J?(A;,  k  —  l)x(k  —  l\k  —  1) 

P(k\k  —  1)  =  l)P(fc-  l\k-  l)$(fc,fc-  1)T  + 

+  G d(k  -  1)Q d(k  -  1)G d(k  -  1)T  (2.6) 

Although  Kalman  first  derived  the  measurement  update  algorithm  using  in¬ 
sights  from  orthogonal  projection  [23],  it  is  easily  understood  for  the  case  of  Gaussian 
PDFs  by  applying  Bayes’  rule: 

f{x(k)\Zk}  =  f  {x(k)\z(k),  Zk~1} 
f{x(k),z{k)\Zk~1} 
f{z(k)\Zk -1} 

f{z(k)\x(k),Zk-1}f{x(k)\Zk~1} 

fizWlZ*-1} 

The  hrst  term  in  the  numerator  of  Eq.  Q2.7j)  represents  the  PDF  of  the  measure¬ 
ment  vector  conditioned  on  the  true  value  of  the  target  state  vector,  as  well  as  the 
previous  measurement  history.  Given  the  relationship  of  Eq.  (2.51),  this  PDF  will 
thus  be  Gaussian  with  a  mean  of  Hx(k)  (the  true  value  of  the  state  vector),  and 
covariance  R.  The  latter  term  in  the  numerator  of  Eq.  (12.7)  represents  the  knowl¬ 
edge  of  the  current  state  conditioned  on  the  previous  measurements:  this  will  be  the 
Gaussian  function  propagated  from  the  previous  processing  cycle  with  parameters 
as  per  Eq.  (2.6).  The  denominator  of  Eq.  (2.7)  is  simply  the  marginal  density  of  the 
measurements,  calculated  as  the  integral  of  the  numerator  over  all  x: 

/OO 

/{*(*),*(*)  |Z*-1}d®(fc) 

-oo 

/OO 

f{z(k)\x(k),  Zk~x} f  {x[k)\Zk~l}({x[k) 

-oo 
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where  the  vector  limits  (—00,00)  remind  us  that  the  integration  is  to  be  performed 
over  every  element  of  the  vector  x{k). 

Collecting  these  results,  Eq.  (2.7)  can  (after  much  algebra)  be  simplified  to  be 
a  Gaussian  PDF,  with  mean  x(k\k)  and  covariance  P{k\k)  |34}212-217]: 

f{x(k)\Zk}  =  A f{x(k)- x{k\k),  P(k\k)} 

where  the  mean  x(k\k)  and  covariance  P(k\k)  are  described  by  the  Kalman  hlter 
measurement  update  equations: 

x(k\k)  =  x(k\k-l)  +  K(k)[z{k)m»H{k)x(k\k~l)} 

P(k\k)  =  P(k\k-l)-K(k)U(k)P{k\k-l)  (2.8) 

with  K (k)  referred  to  as  the  Kalman  hlter  gain: 

K(fc)  =  P{k\k-  l)H(k)T[H(k)P(k\k  -  l)H{k)T  +  K(k)}~1  (2.9) 

The  form  of  the  measurement  update  expression  in  Eq.  (2.8)  gives  rise  to  the  defini¬ 
tion  of  the  hlter  residual,  also  referred  to  as  the  innovation  (341218]: 

is(k)  =  z(k)  —  Pi(yk)x(k\k  —  1)  (2-10) 

Considering  the  form  of  Eq.  (12.10),  the  residual  consists  of  the  difference  between 
the  actual  measurement,  and  the  best  prediction  of  the  measurement:  hence  it  em¬ 
bodies  the  new  information  that  the  measurement  provides  to  the  system.  If  the 
assumptions  of  the  algorithm  summarized  in  Eq.  (2.5)  are  satished,  then  the  residual 


2-6 


series  should  possess  the  following  properties  [341 228-229]: 


Zero-mean  :  E{iy(k)}  =  0 

White  :  E{is(k)is(l)T}  =  0,  k  ^  l 
Covariance  :  E{u(k)u{kf}  =  S (k)  =  H(k)P(k\k  -  l)H(fc)T  +  R(fc) 
Gaussian  :  f{u[k)}  =  J\f{y{k)\  0,  S(k)}  (2-11) 

If  the  residual  series  does  not  display  these  characteristics,  then  there  is  a  high 
probability  that  the  assumptions  of  the  algorithm  have  not  been  satisfied.  There 
are  two  common  causes  of  this  in  the  target  tracking  application:  target  maneuver 
and  measurement  ambiguity.  If  the  maneuver  that  the  target  is  exhibiting  does  not 
match  the  maneuver  described  in  the  Kalman  filter  dynamics  model  (i.e. ,  the  matrix 
$  being  used  in  the  Kalman  filter  does  not  match  the  true  4?  matrix),  then  the 
mismatch  will  produce  a  residual  series  which  does  not  possess  the  characteristics 
described  in  Eq.  (12.11).  Methods  of  dealing  with  the  problems  caused  by  target 
maneuver  are  described  in  Section  12.41  Similarly,  if  the  system  uses  the  incorrect 
measurement  then  the  residual  series  will  not  possess  the  characteristics  of  Eq.  (12.11). 
This  is  common  in  radar  systems  which  produce  several  measurements  during  each 
processing  interval  (some  due  to  different  targets,  some  due  to  clutter),  but  only 
one  measurement  is  correct,  and  the  system  does  not  know  which  is  the  correct 
measurement.  Methods  for  dealing  with  the  problems  of  measurement  uncertainty 
and  data  association  are  described  in  Section  12.51 

Following  from  the  properties  of  Eq.  (j2.ll).  the  likelihood  that  a  given  residual 
vector  will  occur  is  described  by  the  PDF: 

Af{iy(k)]  0,  S(k)}  =  |27rS(/c)|-2  exp  {  —  ii/(£;)T S(fc)~V(fc) }  (2.12) 

Following  from  this  expression,  we  can  define  a  region  in  v(k)  space  such  that  the 
probability  that  a  valid  residual  (resulting  from  the  correct  measurement  and  a 
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correctly  modelled  system)  is  outside  of  this  region  is  very  small.  For  example,  if  we 
want  the  region  V  to  contain  the  most  likely  set  of  residuals  such  that  the  probability 
that  a  residual  lies  within  V  is  P9:y 

Py  =  I  |27rS(fc)|~2  exp  {  —  !^(fc)TS(/c)~1i/(/c)  j  du(k)  (2.13) 

Ju(k)£V 

then  the  region  will  be  defined  by: 

V  =  {v(k)  :  u(k)TS(k)~1u(k)  <  7}  (2-14) 

where  7  is  a  threshold  calculated  to  produce  the  desired  probability  Pg.  There  are 
many  interpretations  of  the  distance  measure  defined  by  u(k)TS(k)~1u(k) .  If  the  co- 
variance  S  (k)  is  an  identity  matrix,  then  the  distance  measure  reduces  to  the  norm 
of  the  residual,  or  alternatively  the  Euclidean  distance  between  the  measurement 
and  the  predicted  value  of  the  measurement.  In  the  multidimensional  case,  it  seems 
appropriate  then  to  weight  each  principal  axis  according  to  the  relative  level  of  cer¬ 
tainty  in  that  direction.  Thus  the  inclusion  of  the  predicted  covariance  modifies  the 
Euclidean  distance  inner  product  to  a  generalized  inner  product  utilizing  appropri¬ 
ate  weightings.  The  impact  of  the  covariance  inverse  is  to  make  the  elements  of  the 
vector  independent,  thus  the  resultant  test  statistic  is  distributed  according  to  a  y2 
PDF,  for  which  the  number  of  degrees  of  freedom  is  the  number  of  measurement 
variables. 

The  Kalman  filter  is  the  optimal  solution  (according  to  almost  any  error  crite¬ 
rion)  to  the  tracking  problem  if  the  system  is  linear  and  known,  with  additive  white 
Gaussian  noise.  The  Kalman  filter  is  also  the  optimal  linear  linear  solution  (accord¬ 
ing  to  the  minimum  variance  unbiased  criterion)  for  any  linear  tracking  problem, 
regardless  of  the  characteristics  of  the  noise  process  [341235]. 

2 As  discussed  in  Section  [2.5.1]  Pg  represents  the  probability  that  a  measurement  will  lie  within 
a  gating  region. 
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There  is  much  more  to  be  said  about  the  Kalman  filter;  this  section  has  merely 
stated  the  equations  of  the  discrete-time  variant.  There  are  many  topics  which 
should  be  understood  in  order  to  address  the  target  tracking  problem,  particularly 
generation  of  the  discrete-time  model  for  an  underlying  continuous-time  system,  and 
modelling  of  time-correlated  noise  processes.  Countless  books  have  been  written  on 
these  subjects,  and  the  interested  reader  is  directed  to  the  thorough  coverage  of  the 
area  which  can  be  found  in  (34], 

The  continuous-time  system  underlying  the  discrete-time  system  of  Eq.  (12.511 
will  be  of  the  form: 


x(t)  =  F  +  G  (t)w(t) 

z{k)  =  H  {k)x{k)  +  v{k)  (2.15) 

where  x(k )  =  x{tk)  for  notational  convenience,  and  w(t)  is  a  continuous-time  white 
noise  process  such  that: 


E{w(t)w(t')}  =  Q  (t)8(t  —  t') 


and  S(t  —  tr)  is  the  Dirac  delta  function: 


5{t)  =  0,  t  0 
=  i 
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The  Kalman  filter  propagation  equations  for  this  system  are  equivalent  to 

Eq.  (M)  [33)209]: 


x(k\k-l)  =  <&{tk,tk_i)x{k  -  l\k  -  1) 

P(k\k-1)  =  &(tk,tk-i)P(k  -  l\k  -  l)&(tk,  4-i)T  + 

+  f  $(4,r)G(r)Q(r)G(r)T$(4,r)Tdr 

J  tk- 1 


(2.16) 


which  gives  rise  to  the  definition  of  G^  and  matrices  which  satisfy: 


G d(k  -  1)Q d(k  -  1)G d(k  -  1 &(tk,  t)G(t)Q(t)G(t)j  t)1  dr 

Jtk-l 

The  state  transition  matrix  <F(t|t0)  is  the  solution  of  the  deterministic  differential 
equation  [3D 40]: 


d$(t,t0) 

dt 


=  F  (t)&(t,t0) 


from  the  initial  condition  $(t0,^o)  =  I,  and  <fr(k\k  —  1)  =  &(tk\tk_i)  for  notational 
convenience.  If  the  dynamical  system  is  time  invariant  such  that  F (t)  =  F,  then 
<F(t|fo)  can  be  shown  to  be  [34)42] : 


&(t\to)  =  exp(F  •  [t  -  t0]) 


where  exp(-)  denotes  the  matrix  exponential  operation. 

2.2.4  Nonlinear  Filters.  While  the  body  of  knowledge  for  dealing  with 
linear  systems  is  very  extensive,  few  systems  are  truly  linear  in  reality,  but  rather 
they  can  be  modelled  as  linear  within  certain  operating  regions.  When  it  is  necessary 
to  operate  outside  these  regions,  linear  techniques  break  down  and  more  advanced 
nonlinear  estimator  forms  become  necessary.  The  most  common  nonlinear  estimation 
algorithm  is  the  Extended  Kalman  Filter  (EKF)  |3)  135]. 
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The  EKF  is  designed  to  address  nonlinearities  of  the  following  form: 


x(t)  =  f[x(t),t]  +  G(t)w(t) 

z(k)  =  h[x(k),  k]  +  v(k)  (2-17) 

where  the  first  equation  is  in  continuous-time  form,  similarly  to  Eq.  (12.151),  as  nonlin¬ 
ear  systems  cannot  in  general  be  converted  to  equivalent  discrete-time  systems.  The 
vector  function  f[x(t),t]  represents  the  nonlinear  dynamics  model  of  the  system, 
while  the  vector  function  h[x(k),  k]  represents  the  nonlinear  measurement  model  of 
the  system.  The  noise  processes  for  both  equations  remain  additive,  this  being  the 
major  restriction  of  the  EKF  technique. 

Using  the  EKF,  time  propagation  between  measurement  samples  ( k  —  1)  (oc¬ 
curring  at  time  tk-i)  and  k  (occurring  at  time  t k)  must  be  performed  using  numerical 
integration  of  the  following  expressions  |35t 44-45] : 

£(f|4_i)  =  f[x(t\tk-i),t] 

P(f|4-i)  =  F[*(t|tfc_1),t]P(f|tfc_1)  + 

+  P(t  |ifc_i)F[*(t|tfc_i),  t}T  +  G(f)Q(f)G(f)T 

(2.18) 


where  the  matrix  F[£;(t|ffc_i),  t]  represents  the  linearization  of  the  vector  function 
f[x(t\tk-i),t\  with  respect  to  the  parameter  x(t\tk-i),  reevaluated  over  each  numer¬ 
ical  integration  step: 


F[®(t|tfc_i),t]  = 


d/[£(*|*fc-i),t] 

dx(t\tk^i) 
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The  measurement  update  equation  for  the  EKF  is  very  similar  to  the  standard 
Kalman  filter  measurement  update  form  of  Eq.  (|2.8j)  [35144]: 

x(k\k)  =  x(k\k  —  1)  +  K.(k){z(k)  —  h[x(k\k  —  1),  k]} 

P(k\k)  =  P(k\k  —  1)  —  K.(k)H[x(k\k  —  1),  k]P(k\k  —  1) 

(2.19) 


where  the  Kalman  hlter  gain  remains  as  per  the  standard  Kalman  hlter: 


K(Jfe)  =  P(k\k  —  l)PL[x(k\k  —  l),k]T  ■ 

■  {H[x(k\k  -  1),  k]P(k\k  -  l)H[x(k\k  -  1),  k}T  +  R(fc)}"1 

(2.20) 


and  the  matrix  H[x(k\k  —  1  ),k]  is  the  linearization  of  the  vector  function 
h[x(k\k  —  1),  k\  with  respect  to  the  parameter  x{k\k  —  1): 

HI,,.,.-  ,ui » 

If  either  the  dynamics  model  or  the  measurement  model  is  linear,  then  the 
standard  Kalman  hlter  equations  may  be  used  for  that  portion  of  the  processing  cycle. 
In  this  study,  we  will  neglect  the  effect  of  nonlinearities  in  order  to  concentrate  purely 
on  the  impact  of  the  problems  caused  by  target  maneuver  and  data  association. 


2.3  Gaussian  Mixtures 

The  Gaussian  mixture  is  a  powerful  modelling  tool  for  characterizing  the  PDF 
of  variables  which  follow  complicated  multi-modal  distributions.  The  basic  form  of 
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a  Gaussian  mixture  containing  N  components  is: 


N 

f{x}  =  P^{X’  Xi‘  P'}  (2-21) 

i— 1 


where  {p^  are  the  relative  weights  of  each  Gaussian  component  ( pi  >  0  Vi, 
Efpi  =  1),  {£«}  are  the  means  of  each  component,  and  { P / }  are  the  covariances. 

As  will  be  seen  in  the  following  sections,  Gaussian  mixture  models  arise  nat¬ 
urally  as  the  solution  to  several  problems  in  target  tracking,  including  maneuvering 
target  tracking  and  data  association.  In  the  coming  sections  it  will  often  be  neces¬ 
sary  to  calculate  the  overall  mean  and  overall  covariance  of  a  Gaussian  mixture.  The 
overall  mean  can  be  calculated  using  the  expectation  operation: 


Me 


N 


E{x}  =  /  {  y"j)iX.\r{x:Xi,  PJ  >  dec 


i= i 


N  /*oo 

^Pi  xN{x]XhPi 

i= 1 


}dx 


N 

y^^jXj 


i=  1 


(2.22) 


The  overall  covariance  can  be  calculated  similarly: 


Pc 


E{(x 


Mc)(a:-Mc)T}  =  E{xxT]  - 


N 

T,  PiXXTM{x ;  xh  Pj} 
i= 1 


dx  -  ncVTc 


y  Pi  I  xxtA f{x]  Xi,  P,}ckc  -  picnc 

i=i 

N 

yp^p, + XiXiT)  - 

i= 1 
N 

ypi  [Pi  +  (*»  -  nc){xi  -  vcf] 

2—1 


(2.23) 
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The  major  theme  of  this  thesis  is  to  simplify  a  Gaussian  mixture  with  many 
components  to  a  reduced  form  with  fewer  components.  One  of  the  common  building 
blocks  for  performing  this  function  will  be  to  merge  two  similar  mixture  components 
together.  In  order  to  maintain  the  same  overall  mean  and  covariance  for  the  mixture, 
the  parameters  of  the  combined  component  will  be  [61293]: 

Weight  :  pc  =  Pi  +  P2 


Mean  :  /xc 


(®i  -  *2)(*i  -  x2)T 


Covariance  :  Pc 


(2.24) 


The  derivation  of  /xc  in  Eq.  (12.24)  is  obvious  following  from  Eq.  (2.22);  Pc  in 
Eq.  (12.24)  is  derived  from  Eq.  (|2.23)  by: 

p  =  ELi  Pi  [P*  +  (£»  ~  McX^i  ~  Mc)T] 

Pi  +  P2 

=  [piPi  +pi(x1  -  Atc)(®i  -  nc)T  + 


+  P2P2  +P2(x2  -  Hc){x2  -  Mc)T]/(Pi  +P2)  (2.25) 


Expanding  /xc  using  Eq.  (2.24): 


Xi-  He  =  x1-  [piXi  +  p2x2)/(pi  +  p2) 

=  (pi®i  +p2X1  -Pi*i  -p2x2)/(pi  +P2) 
=  p2(xx  -  x2)/(p1  +p2) 


(2.26) 
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Similarly: 


X2  ~  He  =  *2  -  {piXi  +  p2X2)/(pi  +  p2) 

=  (piX2  +p2X2  -PiXi  -p2x2)/(pi  +  p2) 

=  -Pi(*i  -  x2)/(pi  +p2)  (2.27) 

Substituting  these  expressions  into  Eq.  (12.25])  we  obtain: 

Pc  =  [piPi  +Pip22(x1  -  x2)(x1  -  x2)T/(pi  +p2)2  + 

+  P2P2  +P2Pi2(x1  -  *2)(*i  -  *2)T/(Pi  +P2)2]/(Pi  +P2) 

=  [piPi  +P2P2  +  :Pl/P2^1  {xi  -  *2)(*i  -  *2)T]/(t»i  +p2 ) 

(P1+P2)2 

=  [p1P1+p2P2+  plp2  (gi  -  g2)(^i  -  x2)T]/(p1  +p2)  (2.28) 

Pi  +  p2 

which  matches  the  result  shown  in  Eq.  (2.24). 

2-4  Multiple  Model  Adaptive  Estimation 

The  techniques  described  in  Section  12.21  are  extremely  effective  at  tracking 
moving  objects  when  the  assumptions  of  the  algorithm  are  satisfied.  However,  one 
assumption  inherent  to  those  methods  is  that  the  dynamics  model  of  the  target  is 
known  for  all  time.  In  any  scenario  of  practical  interest,  this  will  never  be  the  case, 
and  thus  any  claim  of  optimality  (or  even  near-optimality)  is  lost. 

I11  the  target  tracking  application,  there  are  two  fundamental  classes  of  models: 
maneuvering  and  non-maneuvering.  Non-maneuvering  models  are  used  to  exploit  the 
fact  that  most  aircraft  fly  along  straight  paths  most  of  the  time:  such  knowledge 
brings  intrinsic  certainty  into  the  estimation  problem,  which  can  be  used  to  reduce 
the  bandwidth  of  the  tracking  filter  and  greatly  increase  the  precision  of  state  esti¬ 
mates. 
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Maneuvering  models  are  needed  in  an  estimation  system  to  describe  the  motion 
of  the  target  when  it  is  not  non-maneuvering.  The  rich  variety  of  maneuvers  that  a 
target  may  exhibit  (particularly  in  a  military  setting)  comparatively  raises  the  level 
of  uncertainty  in  the  system.  While  some  segments  may  be  able  to  be  predicted  with 
some  accuracy  for  short  periods  of  time,  to  some  extent  the  maneuver  will  need  to 
be  modelled  as  a  noise  process  with  appropriately  selected  strength  and  bandwidth 
(through  an  appropriately  designed  noise  process  shaping  filter). 

Fundamentally,  target  trackers  may  use  two  strategies  to  adapt  to  changing 
dynamics  models.  The  first  approach  is  to  use  the  measurement  to  estimate  the 
unknown  maneuver  parameters  (often  using  batch  processing  or  sliding  window- 
type  methods),  and  then  correct  the  state  estimates  using  these  parameters.  The 
disadvantage  of  this  method  is  that  it  is  slow  to  adapt  to  change:  it  takes  several 
sample  periods  after  the  onset  of  a  maneuver  to  be  able  to  estimate  the  maneuver 
parameters  with  any  level  of  accuracy.  Iterative  reprocessing  of  the  last  measurement 
or  the  last  N  measurements  can  reduce  this  lag,  but  not  without  a  significant  increase 
in  computational  complexity. 

The  second  approach  for  adapting  to  changing  target  dynamics  is  to  use  a  par¬ 
allel  bank  of  non-adaptive  estimators,  each  tuned  to  a  different  operating  condition 
(e.g.,  type  and  level  of  maneuver,  etc.),  and  then  to  combine  the  outputs  into  a  sin¬ 
gle  weighted  average  estimate  based  on  the  apparent  performance  of  each  elemental 
filter.  This  latter  architecture  has  a  major  advantage  over  the  former  in  regards 
to  adaptation  speed:  the  question  of  “what  is  the  maneuver?”  has  effectively  been 
changed  to  “is  the  maneuver  best  represented  by  model  a,  model  b,  or  model  c ?”, 
thereby  re-posing  the  question  as  a  detection  problem,  rather  than  an  estimation 
problem.  Virtually  all  multiple  model  techniques  share  this  same  basic  architecture, 
and  differ  only  in  the  manner  in  which  the  model  weights  are  calculated,  and  in  the 
mixing  of  model-conditioned  estimates  between  processing  cycles. 
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2-4-1  Non- Switching  Models.  The  “basic”  form  of  Multiple  Model  Adaptive 
Estimator  (MMAE)  |3j  [37]  is  derived  when  one  assumes  that  the  model  in  force  does 
not  change  with  time.  While  at  first  this  may  seem  to  defeat  the  purpose  of  employing 
multiple  model  methods,  the  result  reveals  insight  into  ad  hoc  modifications  which 
can  be  used  to  transform  the  structure  into  an  effective  adaptive  algorithm. 

2. 4-1-1  Calculation  of  Model  Probabilities.  The  event  Mj  is  defined 
to  represent  the  condition  that  dynamics  model  j  is  in  force.  No  time  argument 
is  required,  as  the  model  is  assumed  not  to  switch  with  time.  The  a  posteriori 
probability  that  model  j  is  in  force  conditioned  on  the  measurement  history  up  to 
and  including  sample  time  k  is  represented  by: 

Hj{k)  =  P{Mj\Zk}  (2.29) 


Expanding  Zk  in  Eq.  (2.29)  into  the  combination  of  the  previous  measurement 
history  Zk~l  combined  with  the  current  measurement  z(k ),  and  then  using  Bayes’ 
rule  in  both  z(k)  and  Mj  yields: 


N{k)  =  P{Mj\Zk-\z(k)} 
f{z(k)\Zk -1} 

f{z(k)\Mj,Zk-1}P{Mj\Zk~1} 

f{z(k)\Zk~1} 


(2.30) 


where  the  notation  P{-}  refers  to  the  probability  of  a  discrete  event,  whereas  /{•} 
refers  to  the  density  function  of  a  continuous  variable.  The  denominator  in  Eq.  (|2.30ll 
can  be  expanded  using  the  total  probability  expansion  over  all  models: 

Nf 

f  {z{k)\Zk~1}  =  J2f{z(k)\Mi,Zk-1}P{Mi\Zk~1}  (2.31) 

1=1 
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where  Nf  is  the  number  of  hypothesized  models  (and  thus  the  number  of  elemental 
filters  in  the  structure).  This  gives  the  following  recursive  equation  for  the  model 
probabilities  l-ij(k)' 

=  1) 

ESi/PWI 

The  representation  of  the  parameter  space  by  a  number  of  discrete  models  is 
effectively  an  employment  of  the  total  probability  theorem,  representing  the  PDF 
of  the  new  measurement  as  a  weighted  sum  over  each  possible  model.  The  total 
probability  theorem  rule  requires  two  characteristics  of  the  event  space  partitioning: 
the  partitions  should  be  mutually  exclusive  (they  should  not  overlap  within  the 
space),  and  they  should  be  complete  (they  should  span  the  entire  event  space). 
Although  these  requirements  will  seldom  be  met  in  practical  multiple  model  filter 
designs,  they  remain  valuable  design  guidelines:  that  the  models  chosen  should  be 
distinct,  such  that  a  single  model  should  be  predominantly  responding  at  any  one 
time,  and  complete,  such  that  the  elemental  filters  adequately  model  all  possible 
hypotheses  that  may  feasibly  occur. 

The  calculation  of  f{z(k)\Mj,  Zk~1}  is  a  simple  matter  for  the  non-switching 
model  case;  this  density  function  represents  the  match  between  the  incoming  mea¬ 
surement  and  the  previous  measurement  history  assuming  that  model  j  has  been  in 
force  throughout.  Assuming  the  standard  linear  Kalman  filter  measurement  model, 
this  is  evaluated  as: 

f{z{k)\Mj,Zk~1}  =  M{z{k)\B.jxj{k\k-  1), -  l)Hj  +  Rj}  (2.33) 

where  Xj{k\k  —  1)  and  Pj(k\k  —  1)  are  the  state  estimate  and  covariance  of  the 
filter  for  model  j,  and  H j  and  Rj  are  the  measurement  matrix  and  measurement 
covariance  under  model  j. 
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2-4-1-2  Calculation  of  Combined  Estimate.  The  central  conditional 
mean  estimate  is  formed  as  a  weighted  average  of  the  elemental  filter  estimates  using 
the  model  probabilities  pfik)  as  the  weights: 

Nf 

*m  =  E  Hj(k)xj(k\k)  (2.34) 

i= i 

Though  generally  not  required,  the  covariance  of  this  estimate  can  also  be 
formed  using  a  weighted  average,  but  adding  the  correction  term  which  takes  into 
account  the  spreading  introduced  by  the  different  estimates: 

Nf 

P(k\k)  =  ]fJ>j{k )  {Pj(k\k)  +  [xj{k\k)  x(k\k)][xj(k\k)  —  x(k\k)]T } 

3  = 1 

(2.35) 

Figure  12.11  shows  the  structure  of  the  non-switching  MMAE  algorithm.  The 
model-conditioned  estimates  calculated  by  each  elemental  filter  at  each  processing 
cycle  are  passed  directly  into  the  same  filter  at  the  following  processing  cycle,  as  it 
is  assumed  that  the  model  in  force  does  not  change  with  time.  The  overall  combined 
estimate  is  calculated  as  a  weighted  average,  as  shown  in  Eqs.  (2.34)  and  (2.351). 

2.4. 1.3  Ad  Hoc  Modifications.  The  assumption  of  the  algorithm  that 
the  model  in  force  will  not  change  is  evident  in  the  form  of  Eq.  (2.32):  the  recursive 
nature  of  the  formulation  implies  that  the  certainty  of  the  system  will  grow  with 
time,  as  the  a  priori  model  probabilities  are  multiplied  by  the  model-conditioned 
measurement  density  function  at  each  processing  cycle.  To  illustrate  the  difficulty 
that  this  can  cause,  consider  a  model  which  consistently  performs  poorly  (as  it  is 
badly  mismatched  to  the  true  system).  As  time  progresses,  the  probability  of  the 
model  decreases  exponentially,  as  the  certainty  with  which  the  model  is  rejected 
grows.  If  the  true  system  switches  models  (e.g.,  a  non-maneuvering  target  com- 
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xi  (k  —  l|fc  —  1)  x2(k  —  1|  k  —  1)  xtjf(k  —  1|  k  —  1) 

Px(fc  —  l|fc  —  1)  P2(fc  —  l|fc  —  1)  PNf(k-l\k-l) 


xi(k\k  —  1)  x2(k\k  —  1)  Xfff{k\k  —  1) 

Pi(fc|fc-1)  P2(fc|fc-1)  ■"  PNf(k\k-l) 


Xi(k\k)  x2(k\k)  XMf{k\k)  Weighted  Combined 

Pi(fc|fc)  P2(A:|fc)  PNf(k\k)  combination  estimate 


Next  processing  cycle 


Figure  2.1.  Block  diagram  of  non-switching  multiple  model  estima¬ 
tion  algorithm. 

mences  an  evasive  maneuver),  then  many  orders  of  magnitude  of  certainty  with 
which  the  new  model  was  being  rejected  (through  the  recursion)  must  be  overcome 
before  a  significant  amount  of  probability  will  return  to  it.  If  the  probability  of 
the  model  had  decreased  to  such  a  level  that  a  numerical  underflow  condition  had 
occurred  and  the  value  had  been  rounded  to  zero,  then  the  model  probability  will 
never  recover. 

An  obvious  method  of  overcoming  this  difficulty  is  to  impose  a  lower  bound 
on  the  model  probabilities  such  that  any  probabilities  that  fall  below  the  bound  are 
increased  back  to  that  level.  The  level  of  the  bound  can  be  adjusted  experimentally, 
providing  a  trade-off  between  speed  of  adaptation,  and  level  of  certainty  accrued  by 
the  estimator.  Higher  bounds  will  increase  the  agility  of  probability  flow  between 
models  while  making  the  system  more  susceptible  to  incorrect  probability  flow  due 
to  noise,  whereas  lower  bounds  will  slow  the  adaptation  process  while  providing  more 
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Probability  Flow  -  No  Lower  Bound 


Probability  Flow  -  10  3  Lower  Bound 


Figure  2.2.  MMAE  probability  flow  with  and  without  a  lower  prob¬ 
ability  bound.  Note  the  logarithmic  scale  used  in  each  of 
the  plots. 

robustness  against  noise.  An  example  of  this  is  shown  in  Figure  12.21  in  which  the 
non-maneuvering  model  is  in  effect  from  samples  0  to  20  and  from  61  to  100,  and  the 
maneuvering  model  is  in  effect  from  samples  21  to  60.  The  top  diagram  shows  the 
difficulty  experienced  when  no  lower  bound  is  applied:  by  the  time  the  maneuvering 
model  comes  into  force  (at  the  21st  sample),  its  probability  has  reduced  to  10-75, 
and  it  takes  nearly  30  samples  for  this  probability  to  recover  and  return  to  being 
competitive  with  the  non-maneuvering  model.  The  bottom  diagram  shows  the  same 
scenario,  but  with  a  lower  bound  of  1CT3  applied  to  the  model  probabilities.  The 
use  of  the  lower  bound  reduces  the  time  required  to  respond  to  the  model  switch  to 
around  five  sample  periods. 
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The  second  ad  hoc  modification  required  to  use  the  algorithm  in  practice  is 
to  monitor  the  estimates  of  badly-performing  models  for  divergence,  and  reinitialize 
them  if  this  is  sensed  to  occur.  Traditionally  the  trigger  used  for  this  has  been  the 
normalized  residual,  is/  SjVj,  as  utilized  in  Eq.  (|2.14[) .  As  discussed  in  Section 
I2.2.3L  this  value  provides  an  indication  of  the  match  between  the  measurement  and 
the  value  predicted  by  the  model,  hence  if  this  value  is  large  (above  some  threshold), 
then  the  model  can  be  assumed  to  have  diverged  and  should  be  reinitialized  using  the 
combined  estimate  from  the  non-divergent  filters.  An  alternative  trigger  which  could 
be  used  for  reinitializing  models  is  to  restart  them  whenever  the  lower  probability 
bound  is  applied  [57].  In  this  way,  elemental  filters  which  are  not  contributing  to  the 
overall  estimate  are  continually  reset  such  that  they  are  ready  when  the  respective 
model  conies  into  force. 

2.4.2  Switching  Models.  To  allow  for  model  switching,  the  model  in  force 
is  permitted  to  change  at  any  sample  instant,  and  model  history  events  are  used  to 
characterize  the  transitional  behavior  of  the  system  with  time.  Such  events  take  the 
form:3 

MkJ  =  | Mlmi( ,  M2im2l ,  ■  ■  ■  Mk,mkl  |  (2.36) 

The  notation  is  interpreted  as  meaning  that  the  Z-th  possible  model  history  at  time 
k  consists  of  model  m i,  at  sample  time  1,  model  m2,  at  sample  time  2,  etc.,  where 
each  rrij!  is  the  index  to  a  model  number  between  1  and  Nf. 

If  transitions  are  allowed  to  any  of  the  Nf  models  at  any  sample  instant, 
then  every  model  history  event  at  time  k  will  give  rise  to  Nf  new  events  at  time 
(k  +  1),  hence  the  number  of  possible  model  histories  increases  exponentially  with 
time  according  to  Nfk.  The  PDF  of  the  target  state  conditioned  on  the  measurements 

3Note  that  superscripts  are  used  to  indicate  a  model  history  event,  whereas  subscripts  indicate 
a  single  time  step  event. 
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must  then  be  calculated  as  a  total  probability  expansion  over  all  model  history  events: 


Nfk 

f{x(k)\Zk}  =  ^2f{x(k)\Mk’l,Zk}P{Mk’l\Zk}  (2.37) 

1=1 


The  model  history  probability  P{MkJ \Zk}  is  expanded  as: 


P{Mk,l\Zk} 


P{Mw|z(A:),Zfe-1} 

f{Mk\z{k)\Zk~1} 

f{z{k)\Zk~'} 

f{z(k)\Mk’1,  Zk~1}P{Mk,l\Zk~1} 
f{z{k)\Zk~1} 

f{z(k)\Mk’1,  Zk~1}P{Mkj,  Mk-1,l'\Zk~1} 
f{z{k)\Zk-1} 

f{z{k)\Mk\  Zk~x}P{MKj\Mk-^l\  Zk~1}P{Mk~1,1' | Zk~x} 

f{z(k)\ Zk-'} 


(2.38) 


where  l  is  the  index  of  the  current  model  history  (between  1  and  Nfk ),  l1  is  the 
index  of  the  previous  model  history  (between  1  and  N^^1),  and  j  is  the  index  of  the 
current  model  (between  1  and  Nf)  hypothesized  by  the  model  history  event  Mk’1. 
The  denominator  is  expanded  as  a  total  probability  expansion  over  all  model  history 
events  as  in  Eq.  (2.371). 

The  method  commonly  used  to  evaluate  model  history  event  probabilities  is 
to  assume  that  the  model  transition  process  is  a  Markov  process,  such  that  the 
probability  of  transition  depends  only  on  the  previous  model  number  mk- 1  ,,  and 
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not  on  the  prior  model  history  or  prior  measurements: 


P{Mkj\Mk~1,1' ,  Zk_1}  =  P{Mkj\Mk~1,1'} 

=  P{Mk  j  )  M2 ,m,2v  ,  •  •  •  Mk-1  } 

=  P{Mitj\Mk-i,mk_li/ } 

=  Pmk.hlj(k)  (2.39) 

Thus  Pmk-1,j{k)  is  the  probability  of  transitioning  from  model  index  m-k- \v  at  sample 
time  (k  —  1)  to  model  index  j  at  sample  time  k,  where  each  index  is  a  model  number 
between  1  and  Nf. 

While  the  assumption  of  Eq.  (2.39)  provides  a  mechanism  for  computation  of 
the  model  history  probability,  the  conditioning  in  Eq.  (2.38)  of  the  new  measurement 
probability  on  the  model  history  still  produces  an  exponentially  increasing  number 
of  hypotheses  with  time,  hence  further  approximation  (such  as  combining  branches) 
is  required.  The  most  commonly  used  algorithms  are  described  in  the  following 
sections.  The  structure  of  the  full  order  Bayesian  switching  estimator  is  shown 
in  Figure  12.31  The  diagram  demonstrates  the  growing  number  of  filters  which  is 
required:  the  output  of  every  filter  at  the  current  processing  cycle  must  be  processed 
in  the  following  processing  cycle  using  every  model,  hence  the  number  of  filtering 
operations  at  the  k-th  cycle  is  Nfk. 

2-4-3  First-Order  Generalized  Pseudo- Bayesian  Estimator.  The  First- 
Order  Generalized  Pseudo-Bayesian  (GPB-1)  estimator  [31454-456]  limits  the  mem¬ 
ory  of  the  model  history  events  by  combining  the  estimates  from  all  models  into  a 
single  estimate  at  the  end  of  each  processing  cycle.  At  the  start  of  each  processing 
cycle,  the  information  carried  forward  from  the  previous  measurement  interval  is 
a  single  combined  estimate:  any  conditioning  on  previous  model  history  events  has 
been  discarded.  Hence  the  PDF  of  the  estimate  is  modified  from  the  switching  model 
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of  Eq.  p37): 


Nfk 

f{x{k)\Zk}  =  J2f{x{k)\Mk’l,Zk}P{Mk’l\Zk} 

1=1 

to  the  simplified  version: 

Nf 

f  {x(k)\Zk}  =  J2f{x(k)\MkJ,Zk}P{MkJ\Zk}  (2.40) 

3= 1 

where  the  total  probability  expansion  over  the  entire  model  history  event  Mk'1  is 
replaced  by  expansion  over  the  single  most  recent  model  event  Mkj.  Expanding 
Eq.  (|2.40),  we  further  approximate  that  the  previous  measurement  history  Zk~k 
is  adequately  represented  by  the  single  estimate  and  covariance  from  the  previous 
processing  cycle,  {x{k  —  l\k  —  l),P(/c  —  1| k  —  1)}: 

A If 

f{x(k)\Zk]  =  '£f{x(k)\MkJ,Zi}P{MiJ\Zh} 

3= 1 
Nf 

=  f  {x(k)\Mktj,  z(k),  Zk~l}P{Mkj\Zk} 

3= 1 
Nf 

=  ^2f{x(k)\Mkij,z(k),x(k  -  l\k  -  l),P(k  -  l\k  -  1)}  • 

3= 1 

■  P{MkJ\Zk}  (2.41) 


In  effect,  the  approximations  of  Eq.  (12.40)  and  (12.41)  mean  to  say  that  the  entire 
model  transition  history  and  measurement  history  are  representable  through  the 
single  estimate  from  the  previous  processing  cycle.  Once  the  conditional  model 
probability  in  Eq.  (12.41)  has  been  evaluated  using  the  developments  of  Eq.  (12.38) 
and  (12.39),  the  combined  estimate  is  then  calculated  as  per  Eqs.  (12.34)  and  (12.35) 
and  the  cycle  repeats. 
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Figure  2.4.  Block  diagram  of  GPB-1  algorithm. 


The  structure  of  the  GPB-1  algorithm  is  shown  in  Figure  [2741  The  outputs  of 
all  filters  are  merged  into  a  single  estimate  at  each  processing  cycle,  which  is  used  as 
the  input  to  each  of  the  filters  at  the  next  processing  cycle,  providing  a  very  coarse 
approximation  of  the  optimal  system  shown  in  Figure  [2131 


2-4-4  Second-Order  Generalized  Pseudo-Bayesian  Estimator.  The  Second- 
Order  Generalized  Pseudo-Bayesian  (GPB-2)  estimator  [3J457-460]  operates  on  sim¬ 
ilar  principles  to  the  first-order  variant,  except  that  the  memory  is  allowed  to  extend 
an  additional  processing  cycle.  Again,  the  PDF  of  the  estimate  is  modified  from  the 
full  order  switching  model  of  Eq.  (l2.37j): 

Nfk 

f{x{k)\Zk}  =  J2f{x{k)\Mk’l,Zk}P{Mk’l\Zk} 

i=i 
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to  the  simplified  version,  which  this  time  incorporates  the  previous  model  Mk-\.i  in 
addition  to  the  current  model  Mk  f 

Nf  Nf 

f{x(k)\Zk}  =  (2.42) 

i=  1  j= 1 

Manipulating  Eq.  (12.42)  and  assuming  that  the  history  {Mk- i,j,  Zk _1}  is  ade¬ 
quately  represented  by  the  combined  estimates  from  the  i-th  model  in  the  previous 
processing  cycle  {xi(k  —  l\k  —  1),  P*(A;  —  l\k  —  1)},  and  that  (according  to  the  Markov 
model)  the  model  transition  depends  only  on  the  previous  model,  and  not  on  the 
measurement  history: 

Nf  Nf 

f{x(k)\Zk]  = 

i=  1  j= 1 

■P{MkjMk^hZk}P{Mk.hl\Zk} 

Nf  Nf 

=  EE  f{x(k)\MkJ,z(k),Xi(k  -  l\k  -  1),P t(k  -  l\k  -  1)}  • 

i=l  3=1 

■P{MkjMk_lti}P{Mk_hi\Zkj  (2.43) 

Hence  the  operation  of  the  GPB-2  algorithm  is  such  that  the  estimate  from 
each  model  in  the  previous  processing  cycle  is  processed  using  each  dynamics  model, 
giving  Nf2  total  elemental  filters.  At  the  end  of  each  processing  cycle,  the  Nf2 
estimates  are  combined  down  to  Nf  estimates,  combining  estimates  from  different 
models  in  the  previous  processing  cycle  to  leave  one  estimate  for  each  model  in  the 
latest  processing  cycle.  This  is  illustrated  in  Figure  12.51  which  shows  the  structure 
of  the  algorithm.  Comparing  the  structure  to  the  GPB-1  algorithm  shown  in  Figure 
12.41  the  GPB-2  algorithm  uses  Nf2  filters,  thus  it  is  able  to  maintain  Nf  estimates 
and  propagate  each  estimate  with  each  of  the  Nf  filters  at  each  processing  interval, 
rather  than  collapsing  the  PDF  of  target  state  down  to  a  single  estimate  at  each 
processing  interval. 
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Figure  2.5.  Block  diagram  of  GPB-2  algorithm. 
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2-4-5  Interacting  Multiple  Model  Estimator.  The  Interacting  Multiple 
Model  (IMM)  estimator  [31461-465,  32]  is  a  methodology  which  achieves  compa¬ 
rable  performance  to  the  GPB-2  estimator  using  only  Nf  elemental  filters,  rather 
than  Nf 2  as  required  by  the  latter.  The  algorithm  can  be  derived  by  considering  the 
limitations  inherent  to  the  problem:  if  only  Nf  elemental  filters  are  allowable,  then 
the  input  to  the  j'-tli  filter  should  be  the  best  estimate  of  the  state  at  time  instant 
(k  —  1),  conditioned  on  the  event  that  model  j  is  in  force  at  time  instant  k  (the  new 
sample  time),  f{x(k  —  l)\Mkj,  Zk~1}.  Using  this  expression  as  a  starting  point, 
we  follow  a  single  iteration  of  the  algorithm,  through  to  the  calculation  of  the  same 
function  at  the  following  sample  period. 

Following  a  standard  Kalman  filter  propagate-update  cycle  at  the  k-th.  sample 
time,  the  output  of  the  j-th  elemental  filter  will  be  f{x(k)\Mkj,  Zk}.  The  require¬ 
ment  for  the  IMM  algorithm  is  thus  to  combine  the  estimates  from  the  Nf  elemental 
filters  to  calculate  the  inputs  f  {x(k)\Mk+xti,  Zk}  of  each  elemental  filter  for  the  next 
processing  cycle. 

The  overall  PDF  formed  using  the  information  from  all  Nf  filters  represents 
the  total  information  contained  by  the  system  at  time  k: 

Nf 

f  {x(k)\Zk}  =  Y/fMk)\Mkj,Zk}P{MkJ\Zk}  (2.44) 

j= 1 

The  goal  of  the  intermixing  is  thus  to  massage  Eq.  (12.44)  into  the  expansion  necessary 
at  the  input  to  the  next  processing  cycle: 

Nf 

f  {x{k)\Zk}  =  J2f{x(k)\Mk+lji,Zk}P{Mk+l!i\Zk}  (2.45) 

i— 1 
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The  last  factor  in  Eq.  (12.45 ])  is  easily  evaluated  using  the  Markov  assumption  as  per 
Eq.  (M2): 


P{Mk+l,\Zk} 


Nf 

Y  P{Mk+lii\Mkd,  Zk}P{Mkj\Zk} 


3= 1 
Nf 


Y  P{Mk+1}i\Mkd}P{Mktj\Zk} 


3= 1 


Note  that  if  T (k  +  1| k)  is  the  Markov  transition  matrix  such  that: 


(2.46) 


{  T  }  jj  —  P{Mk+iji\Mkj} 


then  Eq.  (2.46)  is  simply  a  matrix  multiplication  of  T(/c  +  l|/c)  by  the  vector  with 
elements  that  are  the  probabilities  P{Mkj\Zk},  yielding  the  vector  of  components 
that  represent  the  probabilities  P{Mk+1^\Zk}. 

The  leading  factor  in  the  sum  of  Eq.  (2.451)  is  then  expanded  using  the  total 
probability  theorem  over  the  previous  model  index  j: 

Nf 

f{x(k)\Mk+hi,Zk}  =  YfMk)\Mk+iA^hj,Zk}P{MkJ\Mk+hl,Zk}  (2.47) 

3= 1 


where  the  backward  transition  probabilities  are  calculated  by: 


P{MkJ\Mk+hi,Zk} 


P {Mktj,  Mk+iti\Zk} 

P{Mk+hl\Zk} 

P{Mk+hi\MkJ,  Zk}P {Mkj\Zk} 
P{Mk+hl\Zk} 

P{Mk+1>i\MkJ,  Zk}P {Mkj\Zk} 
En4  P{Mk+1j\Mktn,  Zk}P {Mk)n\Zk} 


According  to  the  Markov  assumption,  the  transition  probability  P{Mk+iti\Mkj,  Zk} 
does  not  depend  on  the  measurement  history  Zk ,  hence  this  conditioning  is  dropped. 
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Assuming  that  the  estimator  history  Zk  is  adequately  modelled  by  the  Nf 
estimates  from  the  previous  processing  cycle  (each  estimate  conditioned  on  a  different 
model  Mfcj),  Eq.  (2.47ft  is  then  approximated  by  a  single  Gaussian  density:^ 


Nf 


f{x(k)\Mk+ltl,Zk}  « 

3= 1 

«  jV'{®(lfe);*<(A:|A:),P<(A:|A:)}  (2.49) 


where  the  mean  and  variance  of  the  Gaussian  are  given  by: 

Nf 

x\k\k)  =  J2P{M^\Mk+i,i,zk}xj(k\ k) 

3= 1 

Nf 

P\k\k)  =  ^2  P {Mk,j\Mk+l,i,  Zk}{P j(k\k)  + 

3= 1 

+  [xj{k\k)  —  xl(k\k)\[xj(k\k)  —  x\k\k)]T}  (2.50) 


The  a  posteriori  model  probabilities  P{Mkj\Zk}  required  for  Eq.  (2.48)  are 
calculated  recursively  using  the  expressions: 


P{MkJ\Zk}  =  P{MKj\z(k),Zk~1} 
f{Mkj,  z(k)\Zk~1} 
f  {z{k)\Zk~1} 

f  {z(k)\Mkj,  Zk~1}P{Mkj\Zk~1} 
f{Z(k)\Zk -1} 


4Note  that  {xj(k\k),  Pj(k\k)}  is  taken  to  refer  to  the  filter  estimate  at  the  output  of  the  previous 
processing  cycle,  while  {xl(k\k),  P*(fc|fc)}  represents  the  mixed  estimates  to  be  provided  at  the  input 
to  the  next  processing  cycle. 
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As  discussed  in  Eq.  (12.461).  P  {Mk]\Zk  *}  can  be  expanded  using  the  total  probability 
theorem  as: 


P{Mk,j\Zk~1} 


Nf 

P {Mkj\Mk-iti,  Zk~l}P 

2—1 
A Tf 


(2.52) 


where  the  assumption  that  the  model  transition  probability  does  not  depend  on  the 
measurement  history  is  again  invoked. 

Thus  by  substituting  in  Eq.  (12.521)  and  expanding  the  denominator  using  the 
total  probability  theorem,  Eq.  (12.51)  becomes: 


P{MkJ  Z1'  }  = 


YlLf{z{k)\MKn,Zk-l}P{MKn\Zk-1} 


(2.53) 


(Note  that  the  denominator  is  simply  the  scaling  factor  necessary  to  ensure  that  the 
conditional  model  probabilities  sum  to  unity.) 

As  per  the  preceding  multiple  model  techniques,  the  combined  estimate  is 
calculated  at  each  processing  cycle  to  give  the  output  of  the  estimator.  A  block 
diagram  of  the  IMM  algorithm  is  shown  in  Figure  12161  The  structure  is  very  similar 
to  the  non-switching  MMAE  structure  shown  in  Figure  EP  there  are  Nf  filters,  each 
of  which  is  supplied  with  a  different  input.  However,  rather  than  passing  the  output 
of  each  filter  directly  into  the  same  filter  at  the  next  processing  cycle,  the  algorithm 
mixes  the  estimates  according  to  the  Markov  transition  model  in  order  to  allow  the 
system  to  react  to  changes  to  the  model  in  force. 

2-4-6  Summary.  The  previous  sections  have  presented  the  commonly  used 
multiple  model  estimation  structures.  The  traditional  MMAE  is  based  on  the  as- 
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Figure  2.6.  Block  diagram  of  IMM  algorithm. 
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sumption  that  the  model  in  force  does  not  change  with  time;  ad  hoc  modifications 
extend  the  algorithm  to  provide  adequate  performance  in  a  switching  model  envi¬ 
ronment.  The  switching  model  estimators  all  utilize  a  Markov  model  for  transition 
probabilities;  the  most  commonly  used  algorithm  is  the  IMM,  which  provides  similar 
performance  to  the  GPB-2  at  a  fraction  of  the  computational  cost. 


2.5  Data  Association 


Surveillance  radar  system^5]  typically  operate  by  steering  the  radar  beam  in 
a  repetitive  scan  pattern,  such  as  a  circular  scan  (in  which  the  radar  antenna  is 
rotated  around  360°  in  the  horizontal  plane  at  a  constant  rate),  a  sector  scan  (in 
which  the  antenna  is  moved  forwards  and  backwards  across  a  fixed  horizontal  arc)  or 
a  two-dimensional  raster  scan  (effectively  a  number  of  sector  “scan  bars” ,  separated 
in  the  vertical  plane).  At  the  end  of  each  scan  interval,  a  series  of  radar  detections 
will  have  been  made,  which  indicate  the  possible  presence  of  a  target  at  a  particular 
location.  The  data  supplied  with  each  measurement  may  include  angle  (azimuth 
and/or  elevation),  range  and  Doppler  shift,  each  of  which  will  be  corrupted  by  noise. 


At  the  same  time,  the  radar  is  maintaining  a  track  hie,  containing  a  listing 
of  known  targets  alongside  state  information  such  as  location,  velocity  and  acceler¬ 
ation,  and  possibly  identification  information.  Fundamentally,  the  role  of  the  data 
association  algorithm  is  to  determine  how  to  update  the  existing  tracks  using  the 
incoming  block  of  measurements.  The  difficulty  is  that  the  measurements  are  not 
labelled:  the  radar  system  does  not  know  to  which  target  the  measurements  belong, 
or  whether  they  belong  to  a  target  at  all  (i.e. ,  they  may  be  false  detections,  such 
as  those  caused  by  radar  clutter).  This  is  illustrated  in  Figure  12.71  the  solution 
of  how  to  update  the  target  states  (as  illustrated  by  the  solid  dots)  for  the  given 
measurements  (illustrated  by  the  plus  marks)  is  neither  obvious  nor  simple. 


Especially  mechanically  scanned  radar  systems. 
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+  Measurements 
•  Targets 

+ 

+ 

Figure  2.7.  The  data  association  problem:  how  to  update  target  state 
given  a  series  of  unlabelled  measurements. 

The  performance  of  the  radar  system  as  a  whole  is  impacted  greatly  by  the 
data  association  algorithm.  The  data  association  algorithm  handles  tasks  from  ini¬ 
tial  track  forming  (when  targets  are  first  detected),  to  track  update  (maintaining 
accurate  state  estimates  while  targets  are  under  track),  and  finally  track  deletion. 
The  characteristics  of  the  data  association  algorithm  are  able  to  change  the  ability 
of  the  system  to  reject  false  measurements,  the  accuracy  of  the  track  maintained  by 
the  system,  and  the  likelihood  of  loss  of  track  (when  the  estimate  deviates  unrecov- 
erably  from  the  actual  target  position,  or  the  system  incorrectly  declares  that  the 
target  no  longer  exists).  Maintaining  continuity  of  tracks  is  a  high  priority  for  radar 
systems,  as  this  provides  a  much  clearer  view  to  the  radar  operator  and  systems  that 
use  the  radar  output,  and  it  helps  to  keep  the  link  between  target  position  data  and 
identification  information. 

The  following  sections  derive  the  probabilistic  model  utilized  in  almost  ev¬ 
ery  modern  tracking  algorithm,  and  then  they  describe  the  different  approximations 
applied  by  the  various  techniques.  While  the  initial  descriptions  of  each  of  the  con¬ 
ventional  techniques  lead  to  terribly  inefficient  implementations,  they  are  in  fact 
algebraically  equivalent  to  the  more  efficient  implementations  presented  in  the  var¬ 
ious  references  given.  Although  manner  of  presentation  is  unlike  any  reference  of 
which  the  author  is  aware,  it  is  our  opinion  that  it  leads  to  a  clearer  understanding 
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Figure  2.8.  Measurement  gating  in  a  multiple  target  environment. 

of  the  approximations  inherent  to  the  various  techniques  used  in  modern  tracking 
systems,  and  the  consequent  strengths  and  weaknesses  of  the  algorithms. 

2.5.1  Measurement  Gating.  Measurement  gating  is  a  technique  used  in 
virtually  all  data  association  algorithms  to  avoid  the  computation  time  of  processing 
association  possibilities  which  are  kinematically  impossible  or  statistically  improb¬ 
able.  The  concept  of  measurement  gating  is  that,  if  a  measurement  is  not  within 
some  predefined  distance  of  a  track  (i.e.,  within  the  track’s  association  gate),  then 
that  measurement-track  pairing  is  extremely  unlikely  to  be  correct,  and  thus  it  is  not 
considered  for  association.  This  is  illustrated  in  Figure  12181  only  those  measurements 
within  the  shaded  region  around  each  target  are  processed.  In  order  to  have  the  data 
association  algorithm  consider  all  plausible  assignment  options,  the  association  gate 
is  generally  selected  to  be  quite  large,  typically  designed  to  incorporate  at  least  98% 
of  the  hypervolume  under  the  PDF  of  the  predicted  location.  This  hypervolume  is 
the  probability  that  the  target-originated  measurement  will  fall  within  the  gate,  and 
hence  it  is  denoted  as  Pg.  Throughout  this  document  the  terms  measurement  gate 
and  association  gate  will  be  used  interchangeably. 

The  calculations  for  measurement  gating  are  performed  using  the  expression  of 
Eq.  (12.14).  If  Zi(k\k  —  1)  is  the  predicted  location  of  the  measurement  belonging  to 
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target  i,  and  S i(k)  is  the  covariance  of  the  residual  formed  using  the  measurement 
belonging  to  target  i,  then  the  j-th  measurement  will  be  considered  for  association 
with  target  i  if: 


[zj{k)  -  Zi(k\k  -  1)]TS i(k)  [zj(k)  -  Zi(k\k  -  1)]  <  7 


(2.54) 


where  7  is  the  threshold  calculated  from  the  desired  value  of  the  probability  that  the 
correct  measurement  is  in  the  gate  (Pg)  using  y2  tables  as  described  in  [4195-96]. 

Measurement  gating  also  provides  a  mechanism  to  break  the  data  association 
problem  up  into  manageable  portions,  or  clusters.  A  cluster  contains  all  targets 
which  have  common  measurements  within  their  association  gates.  For  example,  if 
there  are  four  targets,  and  targets  1  and  2  share  a  measurement  (as  illustrated  by  the 
measurements  contained  within  both  association  gates  in  Figure  |2.8lh  targets  2  and 
3  share  a  measurement  and  target  4  does  not  share  any  measurements,  then  targets 
1,  2  and  3  will  be  in  one  cluster  and  target  4  will  be  in  its  own  cluster.  When  targets 
do  not  share  measurements,  the  separate  clusters  may  be  processed  as  independent 
tracking  problems,  greatly  reducing  the  number  of  joint  hypotheses,  as  introduced 
below. 


2.5.2  Association  Event  Probability.  The  basis  of  each  of  the  data  as¬ 
sociation  algorithms  discussed  in  this  chapter  is  the  probabilistic  model  for  joint 
association  events,  such  as  that  described  in  [2]  51  [5]  [7].  The  model  is  derived  in 
detail  in  the  next  pages,  followed  by  descriptions  of  the  approximations  utilized  by 
the  various  conventional  tracking  algorithms. 

We  use  the  notation  Qi(k)  to  denote  the  Z-th  joint  association  event  at  sample 
period  k.  Each  joint  association  event  represents  a  hypothesis  on  the  origin  of  each 

6i.e.,  the  _7-t.l1  measurement  is  inside  target  V s  association  gate. 
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measurement 


7 


For  example,  a  typical  joint  hypothesis  might  be: 


Ol(k)  —  {#12,  $21,  $34,  $40 } 

where  the  elemental  event  6jt  represents  the  association  of  measurement  j  with  target 
i,  and  9j0  represents  the  event  that  no  target  is  associated  with  measurement  j,  indi¬ 
cating  that  measurement  j  is  clutter-originated.  In  the  example  above,  measurement 
1  has  been  associated  with  target  2,  measurement  2  with  target  1,  measurement  3 
with  target  4,  and  measurement  4  is  the  result  of  clutter.  When  combined  with  the 
knowledge  of  the  number  of  targets  present  at  time  k,  knowledge  of  which  targets 
have  and  have  not  been  detected  is  implicitly  contained  in  the  event;  in  the  example 
above  targets  1,  2  and  4  were  detected,  thus  if  there  were  four  targets  under  track 
at  time  k  then  target  3  was  missed. 

The  requirements  placed  on  joint  association  events  provide  a  mechanism  to 
embed  physically  meaningful  stipulations  into  the  probabilistic  model.  The  most 
common  requirements  used  are  that  each  measurement  can  be  associated  with  no 
more  than  one  target,  and  each  target  can  be  associated  with  no  more  than  one 
measurement.  These  requirements  overlook  possibilities  in  which  two  targets  are 
within  the  same  radar  resolution  cell  and  produce  a  single  merged  measurement, 
and  possibilities  in  which  a  target  is  close  enough  that  it  occupies  multiple  radar 
resolution  cells  and  produces  multiple  measurements.  However,  they  also  preclude 
associations  which  are  obviously  invalid,  such  as  two  broadly  spaced  targets  giving 
rise  to  a  single  common  measurement,  or  a  single  target  giving  rise  to  two  broadly 
spaced  measurements,  and  where  necessary  the  previously  discussed  possibilities  may 
be  handled  as  exceptions. 

7Only  those  measurements  inside  the  union  of  the  measurement  gates  of  each  target  in  the 
cluster  are  considered. 
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Following  the  development  of  [4)314-317],  probability  of  a  joint  association 
event  can  be  evaluated  using  two  successive  applications  of  Bayes’  rule: 


P{Qi(k) \Zk}  =  P{0,(A:)|Z*,^m(A:),Zfc-1} 
f{Zk,el(k)\Nm(k),Zk~1} 
f{Zk\Nm{k),Zk-1} 

f  {Zk\Nm(k),  Zk~1} 


where  Nm(k )  is  the  number  of  measurements  in  the  combined  gating  region  at  scan 
k,  which  is  inherent  in  the  knowledge  of  the  measurements  themselves. 


The  leading  term  in  the  numerator  of  Eq.  (12.55 j)  amounts  to  the  a  priori  likeli¬ 
hood  of  the  measurements  received  in  scan  k,  conditioned  on  the  past  measurements 


rk— 1> 


,  the  number  of  measurements  in  the  current  cycle  ( Nm(k ))  and  the  joint 


association  event  (Qi(k)).  The  notation  of  the  capital  Zk  is  used  to  represent  the 
joint  state  of  all  measurements,  rather  than  the  marginal  PDF  of  a  single  measure¬ 
ment.  If  the  j- th  measurement  is  hypothesized  to  be  clutter-originated  (such  that 
9j0  e  @j),  then  its  PDF  is  modelled  as  uniform  within  the  combined  measurement 
gate.  Denoting  Y(k)  as  the  union  of  the  measurement  gating  regions  of  all  targets 


in  the  cluster,  and  V(k)  as  the  volume  of  this  region,  the  inc 
PDFs  of  clutter-originated  measurements  can  be  evaluated  as 


ividual  measurement 


f{zj(k)\el(k),Nm(k),Z“-l}=  ('  '  z'WeV(A)  (2.56) 

(O  :  Zj(k)  i  V(k) 

The  evaluation  of  the  components  of  this  PDF  which  are  target-originated  is  dis¬ 
cussed  in  Section  12.5.41 


8Note  that  the  measurement  is  guaranteed  to  be  within  the  combined  association  region  by  the 
prior  application  of  gating,  hence  the  second  case  is  defunct. 
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The  second  term  in  the  numerator  of  Eq.  (]2.55jh  P{Oi(k)\Nm(k ),  Z^”1},  is  the 
probability  of  the  joint  association  event  @/(fc)  for  the  current  scan,  conditioned  only 
on  the  number  of  measurements  in  the  association  gate  ( Nm{k ))  and  the  measure¬ 
ment  history  prior  to  the  current  sample  period.  In  the  absence  of  any  information 
about  the  value  of  the  current  measurements,  the  prior  measurement  history  is  as¬ 
sumed  to  contain  no  information  about  the  current  association  event  such  that: 

P{ei{k)\Nm{k),  Z*-1}  =  P{Qi(k)\Nm(k)} 


This  prior  event  probability  is  evaluated  by  considering  the  target  detections, 
missed  detections  and  clutter  measurements  hypothesized  in  the  event  @/(fc).  Denot¬ 
ing  (5(0;)  as  the  vector  of  target  detection  indicators]9]  and  0(0/)  as  the  number  of 
measurements  originating  from  clutter]10]  both  of  which  are  intrinsic  in  the  knowledge 
of  the  association  event  Oi(k): 


P  {0i(k)\Nm(k)j 


P{Qi(k),  (5(0/),  0(0/)|iVm(/c)} 

P{0i(k)  |(5(0/),  0(0/),  iVm(fc)}P{(5(0/),  0(0/)|  iVm(fc)} 

(2.57) 


The  first  term  in  Eq.  (I2.57)  is  evaluated  by  assuming  that  all  joint  association 
events  that  contain  the  same  set  of  detected  targets  and  the  same  number  of  clutter 
measurements  are  equally  likely.  The  count  of  such  events  is  the  number  of  permu¬ 
tations  possible  when  selecting  0(0/)  =  =  Nm{k)  —  0(0/)  (the  number  of 

detected  targets)  measurements  out  of  Nm{k )  (the  total  number  of  measurements). 
This  is  the  classic  “balls  out  of  an  urn”  problem  without  replacement  and  considering 

9i.e.,  the  *-th  element  of  S  is  ‘T  if  target  i  is  hypothesized  as  being  detected  in  event  0/(fc),  or 
‘0’  if  target  i  is  hypothesized  to  have  been  missed  in  event  0/(fc)- 

10i.e.,  the  total  number  of  measurements  minus  the  number  of  targets  hypothesized  as  having 
been  detected. 
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order,  and  is  evaluated  as  [32144]: 

pNm(k)  _ _ Nm(k)\ _ _  Nm(k)\ 

*(e,)  “  [Nm(k)  —  -0(01)]!  “  0(©z)! 

Subsequently  the  probability  of  each  equally  likely  event  is: 

=  (2=g)~‘  = 

In  the  traditional  development  of  the  algorithm,  as  presented  in  [21226] 
and  [4J315] ,  the  second  term  in  Eq.  (]2.57j)  is  evaluated  by  assuming  independence 
among  <5(0/)  and  0(0/)  such  that: 

P{<5(0i),0(0/)|iVm(fc)}  =  P{<5(0/)}P{0(0/)}  (2.58) 


Strictly,  this  assumption  of  independence  is  invalid  when  conditioned  on  the  number 
of  measurements  Nm(k),  because  once  given  the  target  detection  vector  <5(0/)  and 
the  number  of  measurements  Nm(k ),  one  implicitly  knows  0(0/)  by  the  relationship: 

0(0/)  =  Nm(k)~  X>(©z) 


However,  one  can  arrive  at  essentially  the  same  result  by  applying  Bayes’  rule 
twice  to  remove  the  conditioning,  resulting  in: 


P{<5(0/),  0(@/)|iVm(fc)} 


P{<5(0/),0(0/),iVm(fc)} 

P{Nm(k)} 

P {Nm(k)\d(Qi),  0(0/) }P {<5(0/), 0(0/)} 
P{Nm(k)} 
P{<5(0/)}P{0(0/)} 

P{Nm(k)} 


(2.59) 


where  P{Nm(k)\S(Qi),  0(0/)}  is  cancelled  in  the  final  step  as  it  will  evaluate  to 
unity  for  any  consistent  association  event,  and  the  independence  of  5(0/)  and  0(0/)  is 
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assumed  this  time  without  conditioning  on  Nm(k).  The  denominator  term  P{Nm{k )} 
may  be  evaluated  through  the  total  probability  expansion: 

=  £p{«}pW 

where  the  sum  is  over  all  possible  {(5,  </>}  such  that  </>  +  JT  d,  =  Nm(k).  However, 
since  the  denominator  is  identical  for  all  association  events,  the  term  will  contribute 
a  constant  scaling  factor  to  all  terms,  which  will  be  cancelled  when  the  association 
events  are  normalized  to  sum  to  unity. 

The  a  priori  probability  of  the  target  detection  vector  P{d(Qi)}  is  evaluated 
by  assuming  independence  between  each  of  the  target  detection  possibilities;  e.g.,  if 
the  target  detection  vector  proposes  that  -0  of  the  Nt  targets  were  detected,  then: 

P{S}  =  iV  (1  -  Pdg)Nt^  (2.60) 

where  Pdg  is  the  probability  that  any  one  of  the  Nt  targets  will  be  detected,  and 
that  the  resulting  measurement  is  within  the  association  gate.  If  Pd  is  the  target 
detection  probability  and  Pg  is  the  probability  that  the  target-oriented  measurement 
is  within  the  association  gate,  then: 


Pdg  —  PdPg 

and  thus: 

P{6}  =  ( PdPg  f  (1  -  PdPg)Nt~ *  (2.61) 

The  a  priori  probability  of  the  number  of  clutter  measurements  P{0(©0}  is 
evaluated  utilizing  a  Poisson  model  [3)135]  with  parameter  XV,  where  A  represents 
the  density  of  false  measurements  within  the  validation  region  (i.e. ,  the  expected 
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number  of  clutter  detections  per  unit  hypervolume  in  measurement  space),  and  V  is 
the  hypervolume  of  the  combined  gating  region  for  all  targets: 


pw 


0! 


Collecting  terms  from  Eq.  (12.55)  we  arrive  at  the  following  expression:  (time 
arguments  are  omitted  where  unambiguous) 

pmmizn  =  liMhi gdPgA  a  vmwm 

1  H  1  f{Zk\Nm,  Zt~1}P{Nm) 

=  ■  bhhe-A,,F*(  1  -  p*)*-* 

=  i/{Zt|ei,JV„,Z'-1}(Ar)^(1-'P<i»)''‘^  (2-62) 


where  c  is  the  denominator  of  the  first  expression  in  Eq.  (12.62),  evaluated  as  the 
sum  of  all  numerators  using  the  total  probability  expansion  and  the  simplifications 
of  Eqs.  (237)-(239): 


c  =  f{Zk\Nm{k),Zk-1}P{Nm{k)} 

=  Y,  fizk\®i ,  Nm,  Zfe-1}P{0i|A(m,  Zfe_1}P {Nm} 

i 

=  Yf{Z^N^Zk^}P{^l\Nm}P{Nm} 

l 

=  /{z»|e,,  jv„,  z‘-1}P{9,|«,  0,  jv„}  F(f}p{f}  p{Jv„} 

z  r{ivm} 

=  ^  /{Zfc|0;,  7Vm,  0,  iVm}P{<5}P{</>}  (2.63) 

i 

As  the  above  expression  (after  applying  the  summation)  is  the  same  for  all  association 
events,  the  term  c  merely  functions  as  a  normalization  constant,  ensuring  that  the 
probabilities  of  all  joint  association  events  sum  to  unity.  The  constant  is  modified 
to  c'  after  the  incorporation  of  the  Poisson  exponential  term  e~AV  and  the  factorial 
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of  the  number  of  measurements  Nm\: 


cN  1 

/  iym • 

c 

The  volume  of  the  combined  validation  region  for  the  targets,  V(k),  is  not 
easily  calculated.  However,  considering  that  Eq.  (2.62)  contains  a  V ^  term  (arising 
from  the  Poisson  model  for  clutter),  and  that  f{Zk\Qh  Nm,  Zk^}  will  include  a 
F_1  term  for  each  measurement  believed  to  be  the  result  of  clutter  (as  discussed 
in  Eq.  (12.56)).  of  which  there  are  0,  the  V{k )  terms  will  cancel,  leaving  only  terms 
involving  the  clutter  density  A. 

Eq.  (2.62)  can  be  easily  modihed  to  admit  the  case  of  different  detection  prob¬ 
abilities  for  each  target  (as  is  motivated  physically  by  the  variation  of  detection 
probability  with  radar  return  power  level  and  signal  to  noise  ratio).  However,  the 
simplification  should  be  a  fair  approximation  when  considering  targets  of  similar 
physical  size  (and  radar  cross  section)  within  a  relatively  small  cluster. 

2.5.3  Forming  Joint  Hypotheses.  As  will  be  described  in  Section  r2.5.5i  to 
update  the  estimate  of  the  target  state,  all  joint  association  events  must  be  formed. 
If  there  is  only  a  single  target  in  the  cluster,  then  this  is  a  simple  task,  and  the  asso¬ 
ciation  events  are  that  the  target  is  associated  with  each  measurement  in  the  gate, 
or  that  the  target  is  not  associated  with  any  measurement  (i.e. ,  it  is  hypothesized  to 
have  been  missed). 

If  there  are  two  targets  in  a  cluster,  then  the  number  of  joint  association 
hypotheses  is  roughly  squared  compared  to  the  single  target  case.  If  the  results  of 
the  measurement  gate  tests  are  stored  in  the  “valid”  matrix  such  that  the  (i,  j) 
entry  is  a  binary  flag  indicating  whether  measurement  j  is  inside  the  association 
region  for  target  i,  then  the  pseudocode  in  Figure  [2791  will  form  all  joint  events. 
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Z  Loop  for  first  target  —  associate  measurement  zero  (missed 
Z  detection) ,  then  each  of  the  measurements 
for  ml  =  0  to  numMeas  do 

Z  Check  that  measurement  ml  is  inside  association  gate  for 
Z  target  1 

if  ml  =  0  or  valid(l,ml)  then 

Z  Loop  for  second  target  -  associate  measurement  zero  (missed 
Z  detection) ,  then  each  of  the  measurements 
for  m2  =  0  to  numMeas  do 

Z  Check  that  measurement  m2  is  inside  association  gate  for 
Z  target  2  and  that  measurement  m2  is  not  associated  with 
Z  target  1 

if  m2  =  0  or  (valid(2,m2)  and  ml  !=  m2)  then 

Z  Create  a  new  association  event 
associate  target  1  with  measurement  ml 
and  target  2  with  measurement  m2 

endif 
endf or 
endif 
endf or 


Figure  2.9.  Pseudocode  to  form  all  joint  association  events  for  two 
targets. 
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In  the  case  of  Nt  targets,  one  level  of  for  loop  will  be  required  for  each  target  in 
the  cluster  being  processed.  The  easiest  way  of  implementing  this  for  the  general  case 
will  be  using  recursion.  The  pseudocode  shown  in  Figure  [2.101  recursively  generates 
all  joint  association  events  for  an  arbitrary  number  of  targets.  The  operation  of  the 
code  is  to  associate  each  target  with  every  possible  measurement  recursively  until  all 
targets  have  measurements  associated  with  them,  at  which  stage  the  joint  event  is 
finalized  and  stored  in  whichever  form  is  required  for  the  algorithm  being  utilized. 
As  association  events  are  formed,  the  target-measurement  pairings  for  the  event  are 
progressively  collected  in  the  “assoc”  structure,  which  will  contain  the  complete 
association  list  for  the  event  when  the  recursion  reaches  its  stopping  point. 

2.5.4  Joint  Target  State.  I11  the  previous  development,  the  a  priori  prob¬ 
ability  of  the  measurements  was  expressed  using  the  joint  PDF,  and  little  further 
attention  was  paid  to  its  evaluation.  If  the  state  vectors  of  the  targets  are  assumed 
independent,  then  the  PDF  of  the  joint  target  state  (which  is  required  to  perform  a 
Kalman  filter  measurement  update)  can  be  expressed  as: 

Nt 

f{X(k) \Zk-1}  =  \[!{:ci(k)\Zk-1}  (2.64) 

1=1 

where  the  PDF  of  the  state  of  target  i  is  assumed  Gaussian  with  mean  Xi(k\k  —  1) 
and  covariance  P,(k\k  —  1).  In  this  case,  the  a  priori  knowledge  of  the  measurement 
vectors,  conditioned  on  an  association  event  (as  required  for  Eq.  (12.62)).  is  also 
independent  and  can  be  expressed  as: 

Nm 

f{Zk\Ql(k),Nm(k),Zk-1}  =  l[f{zj(k)\Gl(k),Nm(k),Zk-1}  (2.65) 

3= 1 

where  the  PDF  of  the  measurement  associated  with  target  i  is  Gaussian  with  mean 
Hx.i(k\k  —  1)  and  covariance  HPj(/c|fc  —  ljH7  +  R,  and  the  PDFs  of  measurements 
hypothesized  to  be  the  result  of  clutter  are  uniform  as  per  Eq.  (2.56).  I11  this  case, 
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function  associate (f reeMeasurements ,  freeTargets,  assoc) 

X  Function  is  called  initially  with  f reeMeasurements  containing  a 
X  listing  of  all  measurements  in  the  cluster,  freeTargets  containing 
X  a.  listing  of  all  targets  in  the  cluster,  and  assoc  containing  an 
X  empty  list  which  will  be  used  to  construct  the  association  events 
X  recursively 

if  freeTargets  list  is  empty  then 

X  Association  event  is  complete :  store  in  appropriate  global 
X  structure 

calculate  probability  of  joint  event  described  by  assoc 
add  assoc  to  the  list  of  joint  association  events 

else 

X  Select  the  first  free  target  to  associate ,  delete  from  free  list 
t  =  first  entry  in  freeTargets 

newFreeTargets  =  freeTargets  with  first  entry  deleted 

X  Associate  target  with  measurement  zero  —  i.e.,  hypothesize 
X  that  a  missed  detection  occurred  for  the  target 
assocNew  =  assoc  with  appended  entry  (t,0) 
associate (f reeMeasurements ,  newFreeTargets,  assocNew) 

X  Associate  all  remaining  measurements  with  target 
for  j  =  1  to  length  of  f reeMeasurements  do 
m  =  j-th  element  of  f reeMeasurements 

X  Check  that  measurement  is  inside  target’s  association  gate 
if  valid(t,m)  then 

X  Create  new  list  of  free  measurements 
newFreeMeasurements  = 

f reeMeasurements  with  j-th  element  deleted 

X  Create  updated  association  list 
assocNew  =  assoc  with  appended  entry  (t,m) 
associate (newFreeMeasurements ,  newFreeTargets,  assocNew) 
endif 
endf or 
endif 


Figure  2.10.  Pseudocode  to  form  all  joint  association  events  for  an 
arbitrary  number  of  targets. 
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the  estimates  conditioned  on  a  given  association  event  can  be  updated  from  sample 
period  (k  —  1)  to  sample  period  k  with  the  standard  Kalman  filter  update  equation 
of  Eq.  (]2.8]h  using  the  measurement  assigned  to  the  target  in  the  association  event 
Qi(k)  and  the  measurement  matrix  H.  Targets  which  are  hypothesized  to  have  been 
missed  under  the  association  event  are  left  unchanged. 


In  the  case  in  which  target  state  vectors  are  correlated]^  the  entire  update 
must  be  performed  in  one  step,  using  an  augmented  measurement  matrix  HI.  As  an 
example,  consider  the  same  joint  association  event  used  in  Section  12.5.21 


®l(k)  —  {#12,  $21;  #34,  $4o} 


where  we  recall  that  the  elemental  event  dji  represents  the  association  of  measure¬ 
ment  j  with  target  i,  such  that  our  sample  event  indicates  that  measurement  1  has 
been  associated  with  target  2,  measurement  2  with  target  1,  measurement  3  with 
target  4  and  measurement  4  is  the  result  of  clutter,  and  there  were  four  targets  un¬ 
der  track,  hence  target  3  was  missed.  The  joint  target  state  is  in  block  form,  with  a 
single  block  for  each  target: 


X(k) 


Xi  (k) 

x2  (k) 

x3{k ) 

x4(k) 


ii 


The  motivation  for  admitting  correlation  between  targets  will  become  apparent  in  Section  12.5.81 
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Similarly,  the  joint  measurement  vector  is  in  block  form,  with  a  single  block  for  each 
measurement: 


Zi(k) 

z2{k) 
z3{k ) 

z4(k) 


Given  that  measurement  4  is  hypothesized  to  be  the  result  of  clutter,  under  the 
association  event  Qi(k)  we  discard  it,  defining  the  modified  target-originated  mea¬ 
surement  vector  as: 


Zi(k) 

z2{k) 

z3[k) 


Denoting  H  as  the  matrix  which  describes  the  relationship  between  the  a  single 
measurement  and  the  state  of  a  single  target,  the  block  measurement  matrix  El 
which  describes  the  relationship  between  the  joint  measurement  vector  and  the  joint 
target  state  vector  for  this  association  event  is: 


H(0,(A:))  = 


0  H  0  0 
H  0  0  0 
0  0  0  H 


such  that: 

Zk'  =  M(Ql(k))X(k)  +  V(k) 

and  thus  the  standard  Kalman  filter  update  expression  of  Eq.  (2. 8 j)  is  employed  using 
these  augmented  structures  to  find  the  updated  joint  target  state  conditioned  on  the 
particular  association  event  Qi(k).  To  evaluate  Eq.  (12.62)  when  correlation  exists 
between  targets,  we  use  the  expression: 


=  M{zk'-ux,MPuT  +  r}v~*  (2.66) 
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where  IP  is  the  matrix  containing  the  covariance  of  the  joint  target  state  estimate  X, 
M  is  the  block- diagonal  matrix  containing  the  covariance  of  the  augmented  measure¬ 
ment  noise  V(k),  and  V  is  the  volume  of  the  combined  gating  region.^2!  The  latter 
term  incorporates  the  uniform  density  of  clutter-originated  measurements  (of  count 
c t> ),  as  discussed  in  Eq.  (2.56). 


2.5.5  State  Update.  The  PDF  of  the  joint  target  state  stored  by  the 
tracking  system  at  the  end  of  sample  period  ( k  —  1)  is  denoted  by  f{X  ( k  —  1)  |  Zk~1}. 
Assuming  that  the  prior  state  PDF  is  a  single  Gaussian  function,  the  standard  linear 
propagation  model  presented  in  Section  12.2.31  can  be  used  to  propagate  the  PDF  to 
the  k- tli  sample  period,  resulting  in  f{X(k)\Zk~1}.  This  expression  then  has  the 
measurements  from  the  k-th  sample  period  incorporated,  resulting  in  the  new  PDF 
f{X(k)\Zk}:  which  is  the  same  as  the  original  PDF  except  one  sample  period  later, 
thus  the  process  is  able  to  be  repeated  recursively.  The  probability  of  the  joint 
association  event,  as  developed  in  Section  |2.5.2j,  is  utilized  to  perform  this  state 
update  using  the  total  probability  expansion: 


f{X(k)\Zk }  =  Y.fiX{k)\Zl‘,el(k)}p{el(k)\zk}  (2.67) 

l 


The  expression  f{X(k)\Zk,  Qi(k)}  represents  the  updated  joint  target  state 
conditioned  on  the  new  measurement  history  and  a  specific  association  event.  If 
the  prior  target  state  density  was  a  single  Gaussian  PDF,  then  this  is  easily  calcu¬ 
lated  using  the  standard  Kalman  filter  update  equations  as  per  Eq.  (2.8]h  with  the 
augmented  joint  measurement  matrix  HI,  as  described  in  Section  12.5.41 

Even  if  the  original  joint  target  density  was  a  single  Gaussian  PDF,  the  up¬ 
dated  density  of  Eq.  (12.671)  is  a  Gaussian  mixture,  with  one  component  for  each  joint 

12Note  the  difference  in  notation  between  the  boldface  V (fc),  which  is  the  vector  of  the  augmented 
measurement  noise  for  all  target-oriented  measurements,  and  V  (not  boldface),  which  is  the  scalar 
volume  of  the  combined  gating  region. 
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association  event.  Accordingly,  some  means  is  necessary  to  perform  this  measure¬ 
ment  update  when  the  input  is  a  Gaussian  mixture,  rather  than  a  single  Gaussian 
function.  If  each  component  of  the  Gaussian  mixture  at  the  input  of  the  update  cycle 
is  interpreted  as  the  result  of  an  earlier  association  hypothesis,  denoted  ^u(k  —  l)]!3 
then  the  PDF  at  sample  period  ( k  —  1)  can  be  expanded  as: 


f{X(k  -  lJlZ*-1}  =  ^  f{X(k  -  1  )\Zk-\  yu(k  -  l)}P{*u(k  -  lJlZ*-1}  (2.68) 

U 

Each  component  of  the  PDF  in  Eq.  (|2.68j)  can  be  propagated  using  the  same 
linear  propagation  equations  as  a  single  Gaussian  function  to  find  the  set  of  compo¬ 
nent  Gaussian  functions  {f{X(k)\Zk~1,  \l fu(k  —  1)}}-  The  state  update  will  then  be 
performed  by  modifying  the  PDF  update  equation  of  Eq.  (|2.67ll  to: 


f{x(k)\zk}  =  ^^/{X(k)  |z*,e,(*),*„(*-i)}- 

l  u 

■P{Qt{k)\Z\  Tu(fc  -  1  )}P{%u(k  -  lJlZ*-1}  (2.69) 


In  this  expression  the  term  f{X(k)\Zk ,  &i(k),  ^u(k  —  1)}  once  again  represents 
the  update  of  a  single  Gaussian  PDF  using  a  single  association  event,  hence  the 
standard  Kalman  filter  update  equation  can  be  used,  and  the  result  is  again  a  single 
Gaussian.  The  result  of  Eq.  (12.69)  is  therefore  another  Gaussian  mixture,  with  a 
single  component  for  each  {Tu(fc  —  1),  ©;(£;)}  pair,  i.e.,  each  previous  hypothesis  is 
updated  using  each  current  association  event.  The  number  of  components  in  the  new 
mixture  is  equal  to  the  number  of  previous  components  multiplied  by  the  number  of 
current  association  hypotheses. 


13The  notation  H !u(k  —  1)  is  used  to  distinguish  the  previous  hypotheses,  which  are  association 
histories ,  from  the  latest  single-event  association  hypotheses,  @/(/c).  For  example,  a  single  associ¬ 
ation  history  event  ’F u(k  —  1)  may  consist  of  the  history  {05(1),  0i6(2),  07(3), . . .},  in  which  the 
element  0j(fc)  indicates  that  the  association  history  hypothesizes  the  joint  association  event  0;  at 
sample  time  k.  The  variable  'u'  was  chosen  arbitrarily  for  the  index  so  as  not  to  conflict  with  other 
notation  in  this  document. 
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The  double  summation  of  Eq.  (12.691)  can  be  expressed  as  an  equivalent  single 
summation  for  propagation  to  the  next  processing  cycle.  The  new  mixture  will  then 
be  represented  as: 

f{X(k)\Zk]  =  Y,  f{X(k)\Zk,  *„,(fc)}P{f  „,(fc)|Z*}  (2.70) 

u' 

where  the  new  indexing  u'  covers  all  new  mixture  components,  with: 

f{X{k)\Z\^u,{k)}  =  /{X(fc)|Zfe,  ©,(*;),*„(*-  1)} 

P{mu,(k)\Zk}  =  P{0z(fc)|Zfe,^(fc-l)}P{^(A;-l)|Zfc-1} 

This  strategy  is  the  optimal  Bayesian  data  association  solution,  and  the  tran¬ 
sition  from  Eq.  (12.681)  to  Eqs.  (12.691)  and  (12.701)  reveals  the  major  problem  associated 
with  it:  at  each  sample  period,  the  previous  number  of  hypotheses  is  multiplied  by 
the  number  of  joint  association  hypotheses  in  the  current  sample  period,  hence  the 
number  of  hypotheses  required  to  be  maintained  grows  exponentially ,  with  the  rate 
of  growth  according  to  the  number  of  joint  association  hypotheses  in  each  sample 
period.  Thus  the  optimal  Bayesian  solution  is  clearly  intractable,  and  some  form  of 
simplification  will  be  necessary  to  reduce  the  number  of  components  in  the  Gaussian 
mixture  to  a  manageable  level. 

2.5.6  Global  Nearest  Neighbor.  Possibly  the  easiest  way  of  addressing  the 
problem  of  the  increasing  number  of  hypotheses  would  be  simply  to  take  the  Gaus¬ 
sian  mixture  component  corresponding  to  the  most  likely  hypothesis  and  discard 
the  rest  of  the  mixture,  leaving  only  a  single  component.  At  each  sample  period, 
the  PDF  of  target  state  propagated  from  the  previous  sample  period  will  be  a  sin¬ 
gle  Gaussian  PDF,  hence  the  update  process  consists  of  calculating  the  probability 
of  all  joint  association  events,  and  then  updating  the  joint  target  state  with  the 
most  likely  hypothesis.  This  algorithm  is  referred  to  as  Global  Nearest  Neighbor 
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(GNN),  to  indicate  that  the  best  global  (i.e. ,  joint)  association  hypothesis  is  to  be 
selected  [7)338-342]. 

If  only  a  single  target  is  present,  then  the  equations  of  the  joint  association 
events  will  be  very  similar  to  each  other,  and  the  algorithm  can  be  simplified  to 
the  standard  Nearest  Neighbor  (i.e.,  not  global).  The  simplified  algorithm  performs 
the  association  of  the  measurement  with  the  smallest  distance,  according  to  the 
Mahalanobis  distance  measure,  similar  to  the  exponent  of  a  Gaussian  PDF: 

d2j  =  ( Zj  -  z)TS~1(zj  -  z)  (2.71) 

where  z3  is  the  j-th  measurement,  z  is  the  predicted  measurement  for  the  single 
target,  and  S  =  HPHr  +  R  is  the  predicted  covariance  of  the  residual  formed  with 
the  correct  measurement.  Comparing  Eq.  (|2.71)  with  Eq.  (2.14)  reveals  that  the 
square  of  the  Mahalanobis  distance  is  actually  the  normalized  residual  quadratic, 
which,  in  Section 12)41,  was  used  to  indicate  how  well  the  tracking  model  matched  the 
measurements,  and  is  now  used  to  indicate  how  well  the  measurements  match  the 
tracking  model. 

Nearest  neighbor  association  techniques  are  sometimes  referred  to  as  hard  as¬ 
signment  methods,  indicating  that  hard  decisions  have  been  made:  the  system  as¬ 
signs  target-measurement  associations,  and  progresses  in  processing  assuming  that 
the  assignments  were  indeed  correct.  The  following  sections  describe  techniques 
which  use  probabilistically  weighted  (soft)  decisions.  The  performance  of  hard  as¬ 
signment  methods  is  very  limiting.  As  highlighted  by  Streit  and  Luginbuhl  [53)1]. 
the  hard  decisions  associated  with  techniques  such  as  GNN  introduce  opportunities 
for  decision  mistakes,  and  hence  necessarily  increase  estimation  error.  Intuitively, 
one  can  see  that  much  of  the  information  carried  by  the  joint  target  PDF  is  being 
discarded,  hence  logically  one  would  expect  the  success  of  the  method  to  be  limited. 
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2.5.7  Probabilistic  Data  Association.  The  single  target  Probabilistic  Data 
Association  (PDA)  algorithm  [2}  163-170]  and  its  multiple  target  extension,  the  Joint 
Probabilistic  Data  Association  (JPDA)  algorithm  [JJ310-319],  are  two  more  tech¬ 
niques  which  reduce  the  joint  target  state  PDF  down  to  a  single  mixture  component 
at  the  end  of  each  sample  period.  Rather  than  taking  the  most  likely  association 
hypothesis  at  each  processing  interval,  these  techniques  take  the  weighted  average  of 
all  association  hypotheses. 

Thus,  the  approximation  inherent  to  the  PDA/ JPDA  algorithm  is: 

f{x(k)\zk}  =  Yi  f{x(k)\zk,  e,(fc)}P{e,(fc)|z*} 

l 

«  M{X(k\)X(k\k)1F(k\k)}  (2.72) 

where  X(k\k)  is  the  weighted  average  of  the  means  of  the  Gaussian  sum  as  according 
to  Eq.  (12.22 j),  and  F(k\k)  is  the  weighted  average  of  the  covariances  of  the  Gaussian 
sum  as  according  to  Eq.  (2.23): 

x(k\k)  =  ]Tp{e^l^W^,e^)) 

i 

F(k\k)  =  ^P{0/(A:)|Zfc}[P(A;|A:,0i(A:))  + 

i 

+  {X(k\k,  Gi{k ))  -  X{k\k)){X{k\k,  Qi(k))  -  X(k\k))T] 

Unless  it  is  explicitly  prevented,  the  combined  covariance  of  Eq.  (2.72)  will 
have  correlation  between  targets,  as  induced  by  the  “spreading  of  the  means”  terms 
on  the  final  line  of  the  above  expression.  The  implication  of  this  is  discussed  further 
in  Section  12.5.81  the  JPDA  algorithm  discards  any  correlation  between  targets  in 
order  to  reduce  computational  complexity. 

The  equations  above  represent  one  possible  method  of  implementing  the  JPDA 
algorithm.  However,  because  all  estimates  are  combined  into  a  single  overall  mean 
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and  covariance  at  each  sample  period,  and  correlation  terms  are  discarded,  the 
Kalman  weights  and  covariances  for  a  given  target  under  each  event  will  be  identical, 
and  thus  the  implementation  can  be  optimized  substantially. 

The  common  implementation  of  the  JPDA  algorithm  uses  the  following  alge¬ 
braically  equivalent  equations  to  update  the  state  estimate  and  covariance  of  target 


Xi(k\k)  =  Xi(k\k  —  1)  +  Ki(k)ui(k) 

Pi(k\k)  =  Piiklk-V-aiK^n-Piiklk-V  +  P'ik)  (2.73) 

where  i>i{k )  represents  the  combined  residual  for  target  i  and  P,(fc)  represents  the 
spreading  of  the  variance  due  to  the  combination  of  multiple  Gaussian  components: 

Ui(k) 

”Ak) 

OLi 

P  i(k) 

v>ji(k)  represents  the  residual  formed  with  measurement  j  and  target  i,  and  / 3ji  is  the 
combined  probability  of  all  events  in  which  measurement  j  is  associated  with  target 
i: 

*  =  E  p&tW} 

l:9ji£&i  (k) 

The  PDA/ JPDA  algorithm  has  been  applied  to  a  vast  array  of  problems  in 
open  literature,  and  has  proven  itself  to  be  very  effective  in  less  demanding  tracking 
environments  (for  example,  [4}320-327]).  In  more  demanding  tracking  problems 
(such  as  high  clutter  density  and  targets  which  remain  close  for  extended  periods  of 


3= 1 

=  Zj(k )  —  Hxi(k\k  —  1) 

Nm 

=  Eft* 

3= 1 
Nm 

=  ^  /3jiUji(k)uji(k)T  -  Vi(k)vi(k)T 

3= 1 
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Figure  2.11.  One-dimensional  multiple  target  data  association  exam¬ 
ple. 

time),  the  simplification  applied  to  the  joint  target  state  PDF  begins  to  prove  too 
much  [27] ,  and  the  more  detailed  representations  described  in  the  following  sections 
are  necessary. 


2.5.8  Correlation  Between  Targets.  Although  it  initially  seems  unusual 
to  allow  correlation  to  develop  between  the  state  estimates  of  two  physically  inde¬ 
pendent  targets,  detailed  consideration  of  the  joint  PDF  of  target  state  reveals  the 
motivation  for  doing  so,  and  the  potential  benefit  that  may  be  obtained.  The  fol¬ 
lowing  one-dimensional  tracking  example,  illustrated  in  Figures  [2.111  and  12.121  helps 
to  explain. 

Figure  12.11(a)  shows  the  a  priori  position  of  the  two  targets,  marked  by  V, 
and  the  two  newly  received  measurements,  marked  by  ‘x’.  Considering  only  asso¬ 
ciation  events  in  which  each  target  is  associated  with  a  single  measurement,  there 
are  two  possible  associations:  either  each  target  will  be  associated  with  the  mea- 
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surement  closer  to  it,  or  each  target  will  be  associated  with  the  measurement  farther 
from  it.  The  first  joint  association  event  is  illustrated  in  Figure  [2.11(b),  in  which 
the  rounded  arrows  indicate  the  associations,  and  the  gray  dots  indicate  the  updated 
state  of  the  targets,  moved  toward  the  measurements  used  to  update  them.  The  sec¬ 
ond  association  event  is  illustrated  similarly  in  Figure  12.11(c):  the  greater  disparity 
between  each  target-measurement  pairing  will  tend  to  produce  a  larger  update  in 
the  position  of  each  target;  in  practice  the  probability  of  this  event  will  be  smaller 
as  the  associations  are  less  likely. 

The  updated  states  corresponding  to  the  two  association  events  of  Figures 
12.11(b)  and  (c)  are  illustrated  in  joint  target  space  in  Figure  12.12(a).  The  updated 
state  corresponding  to  each  possible  joint  association  event  maps  to  a  point  in  the 
joint  target  space,  and  the  resultant  joint  target  PDF  will  consist  of  a  Gaussian  sum 
with  weighted  Gaussian  functions  at  each  of  these  points,  and  different  covariance 
matrices  determining  the  spread  about  these  points.  Under  the  approximation  of 
JPDA,  these  joint  hypotheses  are  to  be  represented  by  a  single  Gaussian  PDF.  If 
this  simplified  PDF  is  forced  to  be  independent  between  targets,  then  the  resultant 
function  will  be  as  illustrated  in  Figure  12.12(b):  the  coordinates  of  the  covariance 
must  be  aligned  with  the  target  state  coordinate  systems,  hence  a  broad  approxima¬ 
tion  is  necessary,  representing  a  great  loss  of  information.  If  correlation  is  allowed 
between  targets,  then  the  covariance  takes  the  form  illustrated  in  Figure [2.12(c):  the 
marginal  covariance  in  each  target  coordinate  system  remains  identical,  but  the  high 
degree  of  correlation  between  the  coordinate  systems  greatly  increases  the  informa¬ 
tion  retained. 

The  intuitive  understanding  of  the  benefit  of  allowing  covariance  such  as  that 
illustrated  in  Figure  12.12(c)  is  this:  if  later  measurements  confirm  that  target  2  was 
further  in  fact  to  the  right,  then  this  indicates  that  target  1  was  actually  further 
to  the  left.  Likewise,  if  later  measurements  indicate  that  target  1  was  further  to 
the  left,  then  this  serves  to  confirm  that  target  2  was  actually  further  to  the  right. 


2-58 


In  this  way,  correlation  between  hypotheses  allows  later  measurements  to  resolve 
uncertainties  left  over  from  earlier  processing  cycles,  much  like  the  deferred  decision¬ 
making  capability  of  the  Multiple  Hypothesis  Tracker  described  in  Section  12.5.101 

Joint  Probabilistic  Data  Association  Coupled  (JPDAC)  and  Coupled  Proba¬ 
bilistic  Data  Association  (CPDA)  are  two  extensions  of  JPDA  which  allow  correlation 
to  develop  between  targets  for  periods  during  which  targets  are  in  the  same  region 
(i.e.,  cluster).  JPDAC  [3)328-329]  was  the  initial  implementation  of  the  concept, 
allowing  correlation  to  develop  between  state  estimates  in  a  cluster  containing  two 
targets.  In  the  reference  cited,  however,  there  is  no  mention  of  how  to  approach  joint 
association  events  in  which  detection  of  one  or  both  of  the  targets  is  hypothesized  to 
have  been  missed.  Such  events  may  be  of  minor  concern  if  the  probability  of  detec¬ 
tion  is  close  to  unity  and  the  association  gate  is  selected  to  be  very  large.  However, 
if  the  probability  of  detection  becomes  significantly  less  than  unity,  such  an  omission 
can  have  a  devastating  impact  on  the  performance  of  the  system. 

The  CPDA  algorithm  described  in  [9)  32]  is  a  full  implementation  of  the  ap¬ 
proximation  of  Eq.  (12.72)  admitting  correlation  between  targets.  Implementation 
of  this  algorithm  directly  (without  calculating  the  full  mean  and  covariance  indi¬ 
vidually  for  each  hypothesis  before  merging)  is  rather  difficult,  and  necessitates  the 
somewhat  opaque  notation  found  in  these  articles. 

Correlation  in  PDA  implementations  was  initially  developed  to  fix  the  problem 
of  tracks  belonging  to  nearby  targets  tending  to  coalesce  into  a  single  track  midway 
between  the  two.  However,  as  discussed  in  [12] ,  both  JPDAC  and  CPDA  perform 
more  poorly  in  this  respect  than  the  original  uncorrelated  JPDA  algorithm.  The 
explanation  of  this  phenomenon  provided  in  the  cited  article  is  that  CPDA  develops 
strong  correlation  between  the  targets,  hence  it  tends  to  keep  the  two  tracks  together 
between  competing  measurements.  However,  as  illustrated  in  the  example  of  Figure 
12.121  the  coupling  which  develops  between  targets  is  almost  guaranteed  to  be  negative 
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correlation,  hence  such  an  explanation  would  seem  unsatisfactory.  The  causes  of  bias 
and  coalescence  in  PDA  algorithms  are  examined  in  depth  in  Section  13.21 

2.5.9  Maximum  Likelihood  Methods.  The  systems  described  in  [25}  [26]  bear 
much  resemblance  to  the  CPDA  technique  described  above:  they  both  reduce  the  dis¬ 
tribution  of  the  target  to  a  single  Gaussian  mixture,  and  they  both  explicitly  model 
the  correlation  which  develops  between  targets.  Rather  than  using  the  Gaussian 
function  with  the  parameters  derived  as  the  weighted  mean  over  all  possible  associa¬ 
tions,  these  methods  instead  select  the  state  estimate  as  the  value  which  maximizes 
the  likelihood  of  receiving  the  measurements  (considering  all  possible  associations), 
with  the  covariance  evaluated  using  the  Fisher  information  matrix.  Other  novel  in¬ 
clusions  of  these  techniques  are  that  they  consider  association  over  several  sets  of 
measurements  [25],  and  that  they  propose  an  approximation  of  the  Kronecker  delta 
function  which  avoids  the  necessity  of  generating  all  joint  association  events  [26]. 

2.5.10  Multiple  Hypothesis  Tracking.  The  Multiple  Hypothesis  Tracker 
(MHT)  is  an  algorithm  that  has  been  discussed  in  literature  in  many  different  forms, 
starting  with  [30]  and  [39] .  The  basic  concept  of  the  algorithm  is  to  maintain  hy¬ 
potheses  for  every  plausible  association  event;  each  hypothesis  consists  of  the  prob¬ 
ability  of  the  event,  and  the  mean  and  covariance  of  the  target  state  conditioned 
on  the  event.  In  this  way,  the  algorithm  essentially  maintains  the  Gaussian  mixture 
representation  of  the  PDF  of  target  state  as  developed  in  Section  2.5.51  To  alleviate 
the  exponential  explosion  of  hypotheses,  pruning  and  merging  algorithms  are  applied 
to  the  hypothesis  tree  to  eliminate  those  hypotheses  that  become  implausible,  and 
merge  those  which  produce  similar  results. 

The  original  presentation  of  the  multiple  target  algorithm  [3D]  proceeds  in  a 
measurement  oriented  manner,  whereby  the  algorithm  is  driven  by  considering  the 
possible  origins  of  each  measurement  individually.  The  growth  of  association  hy¬ 
potheses  inherently  follows  a  tree  structure  [61285],  in  which  each  leaf  node  indicates 
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a  hypothesis  for  the  existence  and  location  of  the  targets  at  the  current  time  instant. 
Measurements  from  the  same  scan  are  processed  together  to  avoid  assigning  two 
measurements  to  a  target.  When  considering  each  measurement,  there  are  always 
at  least  two  possibilities:  the  measurement  may  be  a  false  alarm  (as  from  clutter), 
or  it  may  represent  a  new  target,  hence  each  measurement  generates  at  least  two 
new  nodes  for  each  entry  in  the  hypothesis  list.  For  hypotheses  that  contain  existing 
tracks,  the  measurement  may  also  represent  a  continuation  of  each  of  these  (assum¬ 
ing  that  the  measurement  gate  is  satisfied),  hence  more  hypotheses  can  potentially 
be  generated. 

A  more  readily  understood  track-oriented  development  of  the  MHT  algorithm 
is  presented  in  [30] ,  and  similarly  in  [8] ,  where  it  was  termed  the  Structured  Branch¬ 
ing  Multiple  Hypothesis  Tracker  (SB-MHT).  The  structure  of  this  algorithm  is  to 
create  single-target  hypotheses  for  each  of  the  possible  measurements  with  which  a 
target  can  be  associated  at  each  processing  cycle.  One  accounts  for  joint  hypotheses 
by  maintaining  lists  of  compatible  single-target  hypotheses,  providing  an  efficient 
means  of  keeping  track  of  a  large  number  of  joint  hypotheses,  each  pointing  to  a 
series  of  single  target  hypotheses  containing  the  target  parameters. 

The  key  step  in  ensuring  the  performance  and  computability  of  an  MHT  al¬ 
gorithm  is  efficient  hypothesis  pruning  and  merging  algorithms,  yet  the  majority  of 
these  are  based  largely  on  ad  hoc  methods.  Several  different  pruning  methods  are 
suggested  in  p291],  such  as  deleting  those  hypotheses  whose  probabilities  are  less 
than  a  certain  threshold,  retaining  the  Nh  most  likely  hypotheses,  or  retaining  the 
most  likely  hypotheses  such  that  the  total  probability  of  the  set  retained  is  greater 
than  some  threshold  (close  to  unity).  Merging  of  hypotheses  may  be  performed  on 
the  basis  of  shared  measurements  over  a  period  of  time  (e.g.,  if  the  associations  in 
two  hypotheses  are  identical  over  the  last  three  scans,  they  are  merged),  or  after  di¬ 
rect  state  comparison.  Merging  may  be  performed  either  using  replacement  (deleting 
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the  lower  probability  hypothesis  and  adding  its  probability  to  the  other),  or  by  some 
form  of  weighted  averaging. 

2.5.11  Controlling  the  Number  of  Hypotheses.  Of  the  algorithms  discussed, 
the  MHT  is  the  only  algorithm  which  is  able  to  maintain  more  than  a  single  associ¬ 
ation  hypothesis  between  measurement  intervals.  Although  the  MHT  represents  the 
state-of-the-art  in  modern  target  tracking,  the  most  vital  task  to  the  algorithm,  selec¬ 
tion  of  which  hypotheses  to  retain,  is  largely  ad  hoc  in  most  implementations.  Most 
commonly,  a  Maximum  Likelihood  strategy  is  adopted  whereby  the  Nh  most  likely 
tracks  (or  all  tracks  with  probabilities  that  exceed  a  given  threshold)  are  maintained, 
and  the  remainder  deleted. 

Few  merging  strategies  are  discussed  in  open  literature:  the  most  common  is 
n-scan  merging  [59],  which  merges  the  hypotheses  that  incorporate  identical  mea¬ 
surement  histories  over  the  last  n  processing  cycles,  effectively  limiting  the  maximum 
number  of  processing  cycles  for  which  decision  making  can  be  deferred.  As  the  length 
of  the  memory  is  increased,  the  merging  has  less  impact:  the  average  number  of  hy¬ 
potheses  will  increase  exponentially  with  the  length  of  the  memory,  and  will  soon 
need  to  be  controlled  by  deleting  less  likely  hypotheses,  returning  us  largely  to  the 
Maximum  Likelihood  pruning  strategy  where  we  started. 

Other  merging  methods  based  on  similarity  of  target  state  distributions  have 
been  suggested  (6)  |7|  3D]  but  the  little  detail  given  indicates  that  the  simplifications 
rely  on  ad  hoc  state  comparisons  such  as  p2293]: 

|{*l}i  -  {®2}i|  <  Py/{Pl}ii  +  {P2}«  V  i 
{Prjii  <  7{p2  }u  V  i 

{  P  2  } ««  <  7{P  l}u  Vi 
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with  (3  —  0.1  and  7  =  2.0.  Interpreting  the  algorithm,  in  order  to  be  merged,  the 
state  estimates  of  the  hypotheses  must  be  within  roughly  0.1  standard  deviations  of 


each  other 
of  two. 


14 


and  the  covariance  trace  elements  must  differ  by  no  more  than  a  factor 


In  benign  tracking  environments,  the  hypothesis  selection  strategy  is  of  little 
importance,  as  long  as  the  number  of  hypotheses  maintained  is  adequate  to  ensure 
that  the  correct  hypothesis  is  maintained  with  a  high  probability.  In  more  adverse 
tracking  environments,  the  correct  association  hypothesis  may  appear  less  likely  than 
false  hypotheses  for  several  consecutive  processing  intervals.  Hence,  to  maintain 
the  correct  hypothesis,  either  a  much  larger  number  of  hypotheses  will  need  to  be 
maintained  (increasing  exponentially  with  the  number  of  processing  cycles  over  which 
the  association  remains  ambiguous),  or  a  more  efficient  hypothesis  selection  method 
will  be  required. 


2.5.11.1  Early  Methods.  The  approach  of  the  early  methods  proposed 
in  [T|  and  [31]  appears  on  the  surface  to  be  very  similar  to  that  detailed  in  Section 
13.3.41  Alspach  |T]  selects  the  Kolmogorov  variational  distance  as  the  cost  function, 
defined  as: 


JK  =  f  \f{X(k)\nNh(k)}  -  f{X(k)\ClNr(k)}\ dX(k)  (2.74) 


where  f{X(k)\fl]\rh(k)}  represents  the  full  target  state  PDF,  containing  Nh(k )  hy¬ 


potheses,  and  f{X{k)\nNr{k)}  represents  the  reduced  PD 
ponents  ( Nr  <  Nh),  which  is  being  fitted  to  the  full  PDF.[15 


containing  Nr(k )  com- 
The  algorithm  continues 


14The  comparison  is  performed  in  standard  coordinates  rather  than  rotated  principal  coordinates, 
in  order  to  avoid  the  computational  loading  associated  with  a  matrix  inverse  for  each  pair  of  mixture 
components. 

15i.e. ,  J7jvh(^)  represents  the  full  parameters  of  the  distribution  (containing  Nft  mixture  com¬ 
ponent  weights,  means  and  covariances),  and  ft]srr(k)  represents  the  equivalent  reduced  set  of 
parameters  for  Nr  mixture  components. 
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by  merging  and  pruning  mixture  components  until  the  cost  exceeds  a  certain  thresh¬ 
old. 


The  method  of  Lainiotis  and  Park  [31]  uses  the  Bhattacharyya  coefficient  as 
the  similarity  measure: 


(2.75) 


Computation  of  these  functions  for  Gaussian  mixtures  is  a  formidable  task, 
and  the  various  implementations  of  pD,  SE,  [56]  rely  heavily  on  mathematical  approx¬ 


imations  to  be  able  to  evaluate  the  functions  without  explicit  numerical  integration. 


One  of  the  assumptions  invoked  by  Alspach  pQ  is  that  all  components  have  the  same 
covariance.  If  different  hypotheses  in  the  filter  propose  that  the  target  has  and  has 
not  had  missed  detections,  then  the  resultant  covariance  matrices  of  the  mixture 
components  will  be  different,  hence  this  approximation  is  undesirable.  Furthermore, 
once  mixture  components  are  merged,  the  covariance  matrices  will  be  modified  by  the 
spreading  terms  of  Eq.  (12.24).  again  making  the  assumption  of  identical  covariance 
matrices  problematic. 

Lainiotis  [3T]  uses  mathematical  approximations  to  evaluate  the  cost  of  merg¬ 
ing  and  deleting  components.  As  discussed  in  [3D625],  the  Bhattacharyya  coefficient 
between  the  original  Gaussian  mixture,  and  the  same  mixture  with  a  single  compo¬ 
nent  deleted  is  bounded  below  by: 


(2.76) 


Pa  >  1  -  \Pn 


where  pn  is  the  weight  of  the  deleted  component.  Similarly,  the  Bhattacharyya 
coefficient  between  the  original  Gaussian  mixture  and  the  same  mixture  with  a  single 
pair  of  components  merged  is  bounded  below  by: 


Pa  >  1  -  (p,  +  Pj )  V1  ~  Pi, J2 


(2.77) 
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where  pi  and  p3  are  the  weights  of  the  two  components  to  be  merged,  and  pt.j  is  the 
Bhattacharyya  coefficient  between  the  two  components  to  be  merged  (which,  unlike 
the  Bhattacharyya  coefficient  between  two  Gaussian  mixtures ,  is  easily  evaluated). 
Using  the  expressions  in  Eqs.  (2.76)  and  (2.77).  the  algorithm  operates  by  merging 
and  deleting  components  which  produce  a  worst-case  reduction  in  the  Bhattacharyya 
coefficient  that  is  smaller  than  a  given  threshold.^6 


2.5.11.2  Mixture  Reduction  Algorithm.  In  the  context  of  the  problem 
of  tracking  a  single  target  in  clutter,  Salmond  proposed  two  algorithms  pBHdT]  for 
reducing  the  number  of  hypotheses  by  systematically  merging  hypotheses  based  on 
certain  similarity  criteria.  The  focus  of  the  study  was  to  produce  algorithms  which 
were  computationally  feasible  using  the  hardware  available  at  the  time. 

The  first  algorithm  is  referred  to  as  the  joining  algorithm.  The  operation  of 
the  algorithm  is  to  merge  pairs  of  mixture  components  successively  until  the  desired 
level  of  reduction  has  been  achieved.  The  distance  measure  utilized  to  gauge  the 
similarity  of  hypotheses  i  and  j  is  a  Mahalanobis-type  distance  measure: 


,2  _  PiPj 
lJ  Pi  +  Pj 


X;  —  X 


\T- d-1, 


X, 


-Xj) 


(2.78) 


where  the  covariance  P  is  the  combined  covariance  for  the  entire  mixture,  as  in 
Eq.  (223): 


Nh 

p  =  [p,  +  (£j  -  n)(xj  -  n)T] 

2=1 
Nh 

m  =  y^/PiXj 
2=1 


The  leading  fraction  in  Eq.  (2.78)  provides  a  weighting  which  tends  to  favor  merging 
hypotheses  that  carry  lower  probability  weight  over  those  with  higher  probability. 

16  Separate  thresholds  are  used  for  merging  and  deleting. 
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The  term  acts  as  a  smooth  interpolation  of  the  minimum  of  the  two  probabilities, 
and  may  also  be  expressed  as  (p*-1  +pj-1)-1. 

The  algorithm  functions  by  calculating  the  distance  between  all  pairs  of  hy¬ 
potheses,  and  merging  the  pair  with  the  smallest  distance.  The  operation  continues 
until  the  minimum  distance  is  above  a  threshold: 

X  =  0.001  dim(cc) 

which  was  determined  based  on  visual  inspection.  The  threshold  is  designed  to 
ensure  that  the  mixture  structure  is  not  modified  beyond  an  acceptable  level.  If  the 
desired  level  of  reduction  has  not  been  achieved  when  this  threshold  is  reached,  then 
the  operation  continues  until  the  mixture  has  been  simplified  to  the  desired  number 
of  components. 

The  second  algorithm  proposed  is  the  clustering  algorithm ,  which  combines 
mixtures  into  groups  (clusters)  rather  than  pairs.  The  algorithm  operates  by  se¬ 
lecting  a  principal  component  for  a  cluster,  denoted  as  component  c  (initially  the 
component  with  the  largest  probability  weight),  and  merging  all  components  that 
are  within  a  certain  distance  of  the  principal  component.  The  distance  measure  used 
is  the  alternative  definition: 


Df  = 


PiPc 
Pi  +  Pc' 


Xi 


xcy  p 


T  -d-Ua 


Xi 


Xr 


(2.79) 


which  normalizes  using  the  covariance  Pc  of  the  principal  component,  rather  than  the 
total  mixture  covariance  as  in  Eq.  (2.78).  Considering  the  measure  of  Eq.  (2.79)  as 
the  normalized  distance  of  the  i-th  component  mean  from  the  principal  component, 
the  threshold  Tf  used  for  the  distance  test  can  be  based  on  a  y2  test  [7)429],  with 
the  recommended  value: 

Xi  =  O.OSXi' 
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where  T\  is  such  that  {Df  :  Df  <  T/}  contains  99%  of  the  y2  PDF,  where  the 
number  of  degrees  of  freedom  is  the  number  of  states. 

The  clustering  algorithm  continues  iteratively,  selecting  the  largest  unclustered 
component  as  the  principal  component  of  the  new  cluster  at  each  stage.  If  the  process 
completes  before  the  desired  amount  of  reduction  has  been  achieved,  the  algorithm  is 
repeated  with  a  larger  threshold.  The  computational  load  of  the  clustering  algorithm 
is  significantly  lower  than  that  of  the  joining  algorithm,  as  at  each  stage  the  distance 
of  each  component  to  the  principal  component  is  calculated,  rather  than  the  distance 
between  every  pair  of  components. 

In  [38],  Pao  extends  Salmond’s  work  to  admit  the  case  of  multiple  sensors 
and  multiple  targets.  This  extension  is  analogous  to  the  extension  from  PDA  to 
JPDA:  while  the  probabilistic  model  is  updated  to  account  for  joint  association  events 
probabilities,  it  does  not  maintain  correlation  between  target  estimates,  and  it  does 
not  maintain  lists  of  compatible  tracks,  hence  it  intrinsically  forces  independence 
between  target  estimates.  For  example,  if  the  hypotheses  for  two  targets  are  forced 
to  be  independent,  then  the  PDF  of  joint  target  state  will  contain  elements  for  each 
pairing  of  hypotheses  from  the  two  targets,  as  illustrated  in  Figure  [2.131,  rather  than 
restricting  the  uncertainty  to  the  actual  joint  hypotheses  as  illustrated  in  Figure 
132(a). 

2.5.12  Multidimensional  Techniques.  The  techniques  described  thus  far 
have  one  aspect  in  common:  they  all  process  one  frame  of  data  (i.e. ,  the  measure¬ 
ments  resulting  from  a  single  complete  radar  scan)  at  a  time.  An  alternative  approach 
which  has  gained  popularity  recently  is  to  use  multiple  frames  of  data  at  once,  re¬ 
solving  measurement  uncertainty  using  a  sequence  of  data  rather  than  a  single  scan 
frame. 

Multidimensional  assignment  is  a  recently  developed  extension  of  the  GNN 
algorithm  described  in  Section  2.5.61  in  which  multiple  sets  of  data  (either  multiple 
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Figure  2.13.  The  impact  of  forcing  independence  between  targets  in  a 
multiple  hypothesis  system:  resultant  joint  target  PDF 
contains  a  hypothesis  for  each  pairing  of  hypotheses 
from  each  target,  rather  than  only  the  actual  joint  hy¬ 
potheses  as  shown  in  Figure  12.12(a). 

scans  from  a  single  sensor,  or  data  from  multiple  sensors)  are  simultaneously  con¬ 
sidered  for  association.  As  with  GNN,  the  technique  uses  hard  assignment,  with  the 
assignment  selected  after  a  global  optimization  considering  all  possible  joint  associ¬ 
ation  events  over  all  data  sets.  As  one  would  expect,  such  techniques  are  computa¬ 
tionally  demanding,  but  recent  algorithms  such  as  Lagrangian  relaxation  [39]  appear 
to  provide  a  near-optimal  solution  for  a  more  acceptable  computational  burden  [7] . 

Other  recent  suggestions  include  multiple-scan  JPDA  [43],  in  which  joint  as¬ 
sociation  events  over  several  scans  are  probabilistically  averaged  in  one  step,  and 
the  Probabilistic  Multiple  Hypothesis  Tracker  (PMHT)  [53],  in  which  measurement- 
target  association  probabilities  for  multiple  scans  are  estimated  from  a  block  of  data 
using  the  Estimation-Maximization  (EM)  algorithm. 

2.5.13  Interacting  Multiple  Model-Multiple  Hypothesis  Tracker.  In  the 
preceding  sections,  algorithms  were  described  which  are  able  to  track  maneuvering 
targets  in  situations  in  which  the  measurement  is  of  known  origin,  as  were  algorithms 
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which  are  able  to  track  multiple  non-maneuvering  targets  in  clutter,  with  measure¬ 
ments  of  unknown  origin.  The  obvious  extension  of  these  two  separate  developments 
is  to  unify  the  two  to  create  a  system  which  is  able  to  track  maneuvering  targets  in 
the  presence  of  clutter,  with  measurements  of  unknown  origin. 

The  systems  described  in  pT5],  17]  represent  a  unification  of  the  two  preferred 
techniques  from  each  section:  the  IMM  filter  for  maneuvering  target  tracking,  and 
the  MHT  for  data  association.  As  opposed  to  the  strategy  proposed  in  [30)53],  the 
IMM-based  approach  maintains  multiple  state  estimates  within  a  single  hypothesis 
branch,  thus  limiting  another  source  of  exponentially  increasing  hypotheses. 

The  method  described  in  [TT]  uses  IMM  only  for  state  prediction  and  update, 
and  utilizes  the  single  combined  IMM  estimate  for  measurement  gating  and  hypoth¬ 
esis  likelihood  evaluation.  Gating  is  performed  to  validate  each  measurement  j  for 
consideration  with  each  hypothesis  i  using  the  combined  estimate  from  the  IMM  for 
the  hypothesis.  Thus  the  standard  gating  equation  is  used: 

d2j}i  =  ( Zj  -  Zi(k\k  -  l))TS~1(zj  -  Zi(k\k  -  1))  <  7  (2.80) 

where  Zj  is  the  j-tli  measurement,  zt  is  the  predicted  measurement  for  hypothesis 
i,  and  Sj  is  the  covariance  of  the  residual  formed  using  these  two.  The  details  of 
the  algorithm  are  omitted  from  pTT] ,  however  if  the  models  in  the  IMM  differ  only  in 
model  dynamics  (such  that  the  measurement  models  are  identical),  then  the  elements 
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in  Eq.  (12.801)  could  be  evaluated  using: 


Zi(k\k  -  1) 


S, 

T*i(k\k  —  1) 


Hxi(k\k  —  1) 

Nf 

H  ^  ]  kJi,mik\k  1  | k  1) 

m= 1 

HPj(fc|A:  —  1)Ht  +  R 

Nf 

E  /Rm(&|&  -  l){Pj,m(fc|A:  -  1) 

m=l 

+  [*i,m(fe|fc  -  1)  -  x,(k\k  -  1)][']T} 


(2.81) 


where  xitin(k\k  —  1),  ~Pi)m{k\k  —  1)  and  Hijm(k\k  —  1)  are  the  predicted  estimate, 
covariance  and  probability  of  the  m-th  model  from  the  IMM  for  hypothesis  i ,  cal¬ 
culated  as  described  in  Section  12.4.51  Using  the  expressions  in  Eq.  (j2.81j)  for  the 
combined  predicted  measurement,  the  measurement-to-track  association  likelihoods 
can  be  calculated  identically  to  the  single-model  case,  as  per  Section  2.5.101 

The  IMM-MHT  can  also  be  evaluated  using  a  slightly  different  approach,  as 
described  in  [15] .  Rather  than  performing  measurement  gating  and  hypothesis  prob¬ 
ability  calculation  using  a  single  combined  estimate,  the  alternative  strategy  modi¬ 
fies  the  gating  to  be  based  on  the  lowest  distance  of  the  IMM  filters  (i.e.,  the  filter 
demonstrating  the  best  match): 


niindj^m  <  7  (2.82) 

m 

where  is  the  Mahalanobis  distance  between  the  j-th  measurement  and  the 

measurement  predicted  by  the  m-th  model  for  the  z-th  track. 

The  measurement-to-track  association  likelihood  proposed  by  [15]  also  differs 
from  the  technique  in  [17],  utilizing  a  weighted  average  of  the  match  likelihoods  for 
each  of  the  IMM  models,  rather  than  a  single  match  likelihood  to  the  combined  IMM 
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estimate: 


A  =  y  Y1  /Vn|2vrSiim|  s  exp(-id^m)  (2.83) 

m=l 

where  /il)Tn  is  the  probability  of  the  m-th  model  for  the  hypothesis  i,  is  the 

Mahalanobis  distance  from  the  m-th  model  of  hypothesis  i  to  measurement  j  (similar 
to  Eq.  (12.801)),  SitTn  is  the  covariance  of  the  residual  formed  from  measurement  j  and 
the  measurement  prediction  from  model  i,  Pd  is  the  probability  of  detection,  and  A 
is  the  false  alarm  density. 

2.5.14  Summary.  The  previous  sections  have  described  the  techniques 
commonly  used  to  address  the  problem  of  the  ambiguity  of  measurement  origin  in 
tracking  systems.  The  probabilistic  model  for  association  events  presented  in  Section 
12.5.21  is  utilized  by  the  majority  of  the  data  association  algorithm  in  common  use;  the 
various  approximations  applied  by  these  algorithms  were  described  in  the  pursuing 
sections.  Section  12.5. 121  briefly  discussed  some  of  the  recent  developments  which  aim 
to  consider  the  association  of  several  frames  of  data  at  once,  while  Section  2.5.13 
outlined  a  technique  which  combines  the  IMM  algorithm  with  the  MHT  to  be  able 
to  track  maneuvering  targets  in  the  presence  of  clutter. 

2.6  Optimization  Methods 

In  Chapter  JTJ3  we  will  define  a  cost  function  that  describes  the  fidelity  of  the 
representation  of  the  target  state  probability  density  provided  by  a  reduced  order 
PDF.  The  goal  of  this  study  will  then  be  to  maximize  the  fidelity  of  the  simplified 
representation  by  minimizing  the  value  of  the  cost  function.  If  the  cost  function  were 
simple  in  form,  it  might  be  possible  to  solve  exactly  for  the  PDF  parameters  which 
produce  the  minimum  cost  solution.  However,  in  this  problem,  any  meaningful  cost 
function  will  be  highly  nonlinear,  and  numerical  optimization  procedures  will  be 
unavoidable. 
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Figure  2.14.  Gradient  of  the  cost  function  indicating  the  direction  of 
the  minimum. 

Numerical  optimization  involves  techniques  which  iteratively  converge  on  an 
optimal  solution  that  cannot  be  found  exactly  using  analytic  methods.  In  the  context 
of  this  thesis,  we  will  be  seeking  the  minimum  value  of  the  cost  function,  and  the 
iterative  techniques  employed  will  be  designed  to  descend  as  close  to  the  minimum 
as  possible,  in  as  few  steps  as  possible. 

Gradient  techniques  [36133]  are  numerical  optimization  methods  which  use  the 
first  derivative  (gradient)  of  the  cost  function  to  step  iteratively  towards  the  mini¬ 
mum.  Their  operation  in  a  one-dimensional  problem  is  illustrated  in  Figure  12.141  if 
the  gradient  is  positive,  then  the  cost  function  is  increasing  to  the  right,  hence  the 
minimum  must  be  to  the  left;  conversely,  if  the  gradient  is  negative,  then  the  cost 
function  is  increasing  to  the  left,  hence  minimum  must  be  to  the  right. 

The  update  step  for  the  standard  gradient  algorithm  is  described  by  the  fol¬ 
lowing  equation  [36133]: 

S/M  (O  S/1'1 

xk+l  =  xk  -  skTr^- - — ttt  (2.84) 

S/M 
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where  g  f(x)  is  the  gradient  of  the  cost  function  f(x)i^I 


9f(x)  = 


df(x) 

dx 


=  V/(*) 


and  sk  is  the  scalar  step  size  for  the  k- th  iteration  of  the  algorithm.  The  update 
moves  a  distance  of  sk  in  parameter  space,  in  the  direction  of  the  negative  of  the 
gradient  vector.  The  step  size  provides  a  trade-off  between  the  speed  of  convergence 
and  the  accuracy  of  the  final  result.  Using  a  large  step  size  at  the  beginning  of  the 
search  assists  in  increasing  the  rate  of  convergence;  reducing  the  step  size  as  the 
search  progresses  helps  to  refine  the  solution  to  provide  a  very  accurate  final  result, 
and  avoid  overshooting  the  solution.  A  gradient  algorithm  step  should  be  guaranteed 
to  decrease  the  value  of  the  cost  function,  hence  if  the  cost  function  value  increases, 
then  the  step  size  was  too  large,  and  should  be  reduced.  If  several  sequential  steps 
produced  by  the  algorithm  move  in  the  same  or  a  very  similar  direction,  then  the  step 
size  should  be  increased;  conversely,  if  sequential  steps  move  in  the  opposite  direction, 
then  the  step  size  is  too  large  and  should  be  decreased.  If  the  step  size  is  close  to  its 
optimal  value,  then  the  gradient  vector  should  be  approximately  orthogonal  to  the 
value  at  the  previous  step.  One  ad  hoc  algorithm  for  step  size  control  based  on  these 
observations  calculates  the  angle  between  successive  gradient  vectors  [3k  [36140]: 


cos  (3k 

$k-\- 1 


g  f(xk)T  g  f(xk- 1) 

\\9f{xk)\\  •  ||0/(*fc-i 
[1  +  0.9  cos  (3k]sk 


(2.85) 


The  Newton- Raphson  method  operates  similarly  to  the  gradient  technique, 
but  uses  information  provided  by  the  second  derivative  to  converge  on  the  minimum 
value  at  a  much  faster  rate  close  to  the  solution.  The  standard  Newton-Raphson 

17For  convenience  we  choose  to  define  the  derivative  of  a  scalar  with  respect  to  a  vector  as  a 
column  vector,  as  opposed  to  the  convention  that  this  result  is  a  row  vector. 
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step  (for  the  vector  parameter  case)  is  given  by  [36155]: 


Xk+l  Xk  A(flJfc)  9 


where  A(xk)  is  the  Hessian  matrix: 


{A  (xk)}ij 


d2f(xk ) 
d{xk}id{xk}j 

{A(xk)}ji 


(2.86) 


The  operation  of  the  Newton- Raphson  algorithm  is  to  step  to  the  minimum  of 
the  parabola  which  approximates  the  cost  function  at  the  current  point.  If  the  cost 
function  in  the  region  of  the  current  parameter  value  is  well  approximated  by  the 
second  order  Taylor  series  terms  (i.e.,  the  locally  fitted  parabola),  then  the  result 
of  the  step  will  move  very  close  to  the  solution.  This  is  illustrated  in  Figure  2,151 
the  cost  function  shown  is  x4;  the  step  illustrated  moves  from  the  original  parameter 
value  to  the  minimum  of  the  parabola  with  first  and  second  derivatives  that  match 
those  of  the  cost  function  at  the  original  point. 

One  obvious  requirement  of  the  Newton- Raphson  method  is  that  the  Hessian 
matrix  must  be  non-singular.  If  this  is  not  the  case,  then  A(ajfc)-1  will  not  be  able 
to  be  evaluated,  and  the  technique  cannot  be  used. 

Even  if  the  full  Hessian  is  not  calculated,  it  can  be  beneficial  to  utilize  some  of 
the  information  from  the  second  derivative  matrix  in  the  computation.  For  example, 
if  a  diagonal  weighting  matrix  is  utilized  in  order  to  force  the  cost  function  contours 
in  the  local  region  to  be  roughly  circular,  then  the  resulting  weighted  gradient  step 
will  move  directly  toward  the  solution,  overshooting  less  and  taking  fewer  steps  to 
converge  [30:34].  Thus,  using  a  Hessian  matrix  with  only  diagonal  terms,  or  a  block- 
diagonal  Hessian  matrix,  may  speed  convergence  somewhat. 
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Figure  2.15.  Operation  of  the  Newton- Raphson  algorithm:  each 
step  moves  to  be  minimum  of  the  local  approximating 
parabola. 

The  various  techniques  described  in  these  sections  will  converge  to  a  minimum 
provided  that  the  algorithm  is  commenced  from  within  the  region  of  convergence. 
The  gradient  method  is  guaranteed  to  converge  to  a  local  optimum  from  anywhere 
in  the  search  space.  The  region  of  convergence  for  for  the  Newton- Raphson  method 
is  a  finite-sized  convergence  ball;  diagonal  or  block-diagonal  Newton-Raphson  ap¬ 
proximations  will  be  somewhere  between  the  two.  If  the  cost  function  has  a  single 
global  minimum  and  it  increases  from  that  point  in  every  direction,  then  the  solution 
found  by  the  iterative  algorithm  is  guaranteed  to  be  the  global  minimum.  The  cost 
function  forms  defined  in  Chapter  HI]  do  not  have  this  characteristic,  but  rather  they 
are  extremely  multi-modal,  with  many  maxima  and  minima.  In  this  situation,  the 
algorithms  will  converge  (assuming  that  the  starting  point  supplied  to  the  algorithm 
is  inside  the  region  of  convergence)  to  a  local  minimum  (most  likely  the  minimum 
closest  to  the  starting  point  given  to  the  algorithm),  and  there  is  no  guarantee  that 
this  point  will  indeed  represent  the  global  minimum. 
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III.  Analysis 


3. 1  Introduction 

As  outlined  in  Section  11.21  the  goal  of  this  study  is  to  develop  techniques  which 
are  able  to  maintain  a  high  fidelity  representation  of  the  PDF  of  target  state,  focusing 
on  the  efficiency  of  this  representation.  The  most  compact  PDF  representation  in 
common  use  is  that  of  a  single  Gaussian  function;  Section  13.21  examines  some  of  the 
difficulties  which  are  commonly  experienced  using  such  a  coarse  approximation. 

Section  [5151  then  develops  an  algorithm  which  aims  to  provide  the  best  possible 
representation  of  the  target  state  PDF  using  any  given  number  of  components  in 
a  Gaussian  mixture.  The  algorithm  is  based  on  the  minimization  of  a  cost  func¬ 
tion;  possible  selections  for  this  cost  function  are  considered  in  Section  13.3.11  The 
cost  function  selected  for  our  algorithm,  the  Integral  Square  Difference  (ISD)  cost, 
is  examined  in  detail  in  Section  13.3.21  before  iterative  optimization  techniques  are 
applied  in  Section  13.3.31  Finally,  it  is  apparent  that  iterative  optimization  of  such 
a  multi-modal  function  is  highly  dependent  on  the  starting  point  provided  to  the 
algorithm;  a  methodology  for  deriving  a  near-optimal  starting  point  is  developed  in 
Section  13.3.41 

3.2  PDA  Bias  and  Coalescence 

The  JPDA  algorithm  is  computationally  desirable  when  compared  to  more 
modern  MHT  algorithms,  as  the  tracking  system  is  required  to  maintain  only  a  sin¬ 
gle  Gaussian  PDF  rather  than  a  Gaussian  mixture  with  a  component  corresponding 
to  each  hypothesis,  with  the  number  of  hypotheses  growing  at  an  exponential  rate. 
However,  the  original  formulation  of  the  JPDA  algorithm  exhibits  significant  diffi¬ 
culty  in  tracking  closely-spaced  targets.  Hong,  et  al.  [161  EH  122]  suggest  that  the 
difficulty  in  tracking  closely-spaced  targets  exhibited  by  JPDA  is  due  to  a  bias  inher¬ 
ent  in  the  algorithm,  arising  from  the  Kalman  filter  measurement  update  equation: 
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Xi(k\k)  =  Xi(k\k  -  1)  -  Ki(k)[Zi(k)  -  ai(k)H-Xi(k\k  -  1)] 


(3.1) 


where  zfk)  is  the  combined  measurement  for  target  i,  Hxi(k\k  —  1)  is  the  predicted 
measurement  for  target  i ,  and  «;(&)  is  the  scaling  factor  which  accounts  for  events 
under  which  the  target  is  not  detected  and  no  update  is  performed. 

Despite  the  suggestion  above,  if  the  estimate  prior  to  incorporation  of  the  mea¬ 
surement  is  unbiased,  and  the  residual  in  Eq.  (I3.lj)  is  zero-mean,  then  the  updated 
estimate  is  guaranteed  to  be  unbiased.  To  determine  whether  a  bias  will  be  intro¬ 
duced,  we  need  to  take  the  expected  value  of  the  residual.  Denoting  the  expected 
value  of  the  residual  for  target  i  as  bj(fc),  and  expanding  using  the  definitions  of 
Section  12.5.71 


bi(k )  =  E{zi{k)  -  ai(k)Hxi(k\k  -  1)} 

=  E  |^P{0,(A:)|Zfc}[zm(e1(fc)>i)(A:)  -  Hxi(k\k  -  1)]|  (3.2) 

where  the  summation  is  taken  over  all  joint  events  in  which  target  i  is  hypothesized 
to  have  been  detected,  and  m(Qi(k),i)  is  the  measurement  associated  with  target  i 
under  the  event  Qi(k). 

The  expectation  of  Eq.  (3.2)  could  be  taken  over  a  number  of  different  variables. 
Since  we  are  concentrating  on  the  bias  arising  from  only  a  single  processing  cycle, 
the  measurements  from  previous  cycles  are  assumed  known,  and  the  expectation  is 
taken  only  over  the  measurements  from  the  processing  cycle  under  consideration. 
Cong  [161 122]  also  takes  the  expectation  over  the  number  of  measurements  ( Nm(k )) 
in  the  cycle  under  consideration.  Observing  that  the  number  of  measurements  will 
be  known  exactly  at  run-time  when  this  processing  is  performed,  this  would  seem 
unnecessary,  and  furthermore  since: 

Ex,y{g(x,y)}  =  Ey{Ex{g(x,y)\y}}  (3.3) 
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as  is  derived  by: 


Ey{Ex{g(x,y)\y}} 


g(x,y)f{x\y}dx 


f{y}dy 


9(x,y)f{x\y}f{y}dxdy 

g{x,y)f{x,y}dxdy 
Ex,y{g(x,  y)} 


Thus,  if  the  expression  obtained  by  assuming  conditioning  on  the  number  of  mea¬ 
surements  is  shown  to  be  unbiased  (such  that  bi(k)  =  0),  then  the  equivalent  result 
also  taking  expectation  over  the  number  of  measurements  will  also  be  unbiased. 

Evaluating  Eq.  Q3.2j)  assuming  conditioning  on  the  previous  measurement 
history  and  the  number  of  measurements  in  the  current  cycle,  and  expanding 
P{®i(k)\Zk}  using  Eq.  (12.55 j),  we  obtain: 


Kik)  =  /  \  !)] 


k  i 


•  f{Zk\Nm{k),Zk-x}dZk 


E 


t  i 


f{zk\Ql(k),Nm(k),zk-L}p{&l(k)\Nm(k)} 

f  {Zk\Nm(k),  Zfc_1} 

[Zm( et(k)dk)  -  H£i(k\k  -  1)]  \f{Zk\Nm(k),  Z^jdZk 


Y  f{Zk \Gi(k),  Nm(k),  Zk-1}P{Ql(k)\Nm(k)} 


i 


[z  m(&i(k)  ,i)(k)  HXi(^k\k  1)] 

1 


f  {Z k\Nm(k) ,  Zfc-1}dZ 


f{Zk\Nm(k),Zk~L} 


(3.4) 


where,  as  in  Section  12.2.31  the  vector  limits  (— oo,  oo)  remind  us  that  the  inte¬ 
gration  is  to  be  performed  over  every  element  of  the  vector  Zk.  Cancelling  the 
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f{Zj;\Nrn(k),Zk  terms  in  the  numerator  and  denominator,  and  exchanging  the 
order  of  integration  and  summation: 


b^k)  =  I  <  ^f{Zk\el(k),Nm(k),Zk-1}P{el(k)\Nm(k)}  • 

-5°  [  i 

(^)  HxCj(k\k  1)]  ( 

(*  OO 

f{Zk\el(k),Nm(k),Zk-1}[zm{el{k),i)(k)  -Hx^k-  1)]  dZfc 

-oo 

(3.5) 


The  integral  in  Eq.  (]3.5j)  can  then  be  broken  up  into  the  following  difference: 


f{Zk\Qt(k),Nm(k),  Zfc-1}[zm(ei(fc))q(A:)  -  H Xi(k\k  -  1)]  dZ, 

/OO 

zm(Qi(k),i)(k)  f  {Z klO^k),  Nm(k),  Zfc_1}  dZfc 

-oo 

poo 

-  /  H.Xi(k\k  —  1)  f  {Z k\®i(k) ,  Nm(k),  Zfc_1}  dZ^  (3.6) 


The  hrst  term  in  Eq.  (3.6)  amounts  to  the  predicted  mean  of  the  measurement  asso¬ 
ciated  with  target  i  under  association  event  Qi(k),  which  is  merely  the  measurement 
prediction  Hxi(k\k  —  1).  The  second  term  has  no  variables  in  Zk  other  than  the 
density  itself,  hence  the  integral  of  the  density  evaluates  to  unity,  again  leaving 
Hxi(k\k  —  1).  These  two  terms  obviously  cancel,  such  that  the  bias  of  JPDA  for 
target  i  is: 


bi{k)  =  P {Qi(k)\Nm(k)}  ■  0 

i 

=  0  (3.7) 
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Thus  it  has  been  shown  that,  according  to  the  measurement  model  presented 
in  Section  12.5.21  JPDA  is  in  fact  unbiased.  Following  identical  steps  for  the  CPDA 
algorithm,  allowing  correlation  between  targets,  will  arrive  at  a  similar  result.  The 
scarce  details  provided  in  |I61 122]  for  the  evaluation  of  the  JPDA  bias  make  it  very 


difficult  to  compare  the  resul 
mistake  which  is  easily  made1 2: 


derived  above  with  those  previously  published.  One 
is  to  treat  the  denominator  of  Eq.  (|2.55[)  as  a  constant 
(with  respect  to  the  measurement  values),  thereby  evaluating  the  integral  of  Eq. 
by  the  approximation: 

C  f,  ( nr\  f  f-,  (  nr\c\  nr 

(3.8) 


f^x\dx  ~  I  f^dx 


f  hfa) dx 

In  this  case,  the  measurement  PDF  due  to  the  expectation  operation  is  not  cancelled 
with  the  measurement  PDF  in  the  denominator  of  the  event  probability.  Instead, 
the  denominator  of  the  event  probability  is  treated  as  a  constant  and  neglected  as 
per  the  development  of  Section  I2.5.2PI  leaving  the  product  of  the  two  PDFs,  which 
can  be  evaluated  with  some  difficulty.  The  cancellation  of  the  measurement  PDFs 
would  not  have  been  possible  if  the  expectation  operation  had  been  extended  initially 
across  the  number  of  measurements  —  this  further  suggests  that  this  error  may  have 
been  made  in  [161 122],  However,  the  expression  of  Eq.  (3.8)  is  without  mathematical 
basis,  hence  one  would  expect  that  the  robustness  of  any  apparent  performance  gain 
induced  by  introducing  the  approximation  would  be  highly  questionable.  The  results 
obtained  in  this  study  using  this  approximation  appeared  promising  in  some  areas, 
but  any  overall  improvement  in  any  realistic  scenario  was  not  evident. 

The  puzzling  aspect  of  this  result  is  that  the  JPDA  algorithm  does  exhibit 
a  form  of  “bias”  when  targets  are  closely  spaced:  to  such  an  extent  that  the  two 
estimates  can  essentially  converge  to  the  mid-point  between  the  targets  in  scenarios 
in  which  targets  are  closely  spaced  for  extended  periods  of  time.  This  phenomenon, 


1 Indeed  a  substantial  amount  of  time  was  lost  during  this  study  due  to  this  very  error. 

2In  Section [2,5.21  it  was  argued  that  the  denominator  is  constant  across  all  association  events:  it 
is  not  constant  across  measurements  values,  which  are  the  variables  of  integration  in  this  expression. 
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referred  to  as  coalescence,  was  examined  in  p,  [IT]  [12],  in  which  it  was  concluded 

that  [ES256]: 


...the  conditional  density  of  the  targets’  joint  state  has  a  particular  mul¬ 
timodality:  in  addition  to  the  local  optimum  for  the  nonswapped  tracks, 
often  other  local  optima  exist  for  track  swap  possibilities.  The  approach 
of  centering  a  Gaussian  optimally  (in  the  MMSE  sense)  between  these 
local  optima  implies  a  preference  to  track  coalescence  over  track  swap. 

Hence  this  multi-modality  explains  how  the  JPDA  can  be  unbiased  yet  still  have 
difficulties  with  track  coalescence.  As  further  suggested  by  Blom  and  Bloem  (12], 
the  major  source  of  uncertainty  in  this  particular  scenario  is  the  identity  of  the 
target.  In  this  case,  there  will  be  two  primary  joint  hypotheses:  one  will  be  correct, 
and  the  other  will  be  identical,  but  with  the  two  targets  exchanged  (which  is  equally 
valid  as  far  as  the  tracker  is  concerned  since  measurements  are  not  labelled).  Thus 
for  all  the  system  knows  (from  the  set  of  measurements  it  has  been  given),  either  of 
these  two  primary  joint  hypotheses  could  be  the  correct  one,  hence  the  best  unbiased 
answer  in  a  minimum  mean  square  error  sense  is  to  “hedge  your  bets”  either  way. 
In  other  words,  the  system  no  longer  knows  which  target  is  which,  so  it  simply 
takes  the  average  of  the  two  possibilities,  tracking  the  mid-point  between  them. 
As  highlighted  by  Blom  and  Bloem,  in  virtually  any  practical  situation  it  is  more 
desirable  to  track  the  targets  with  the  incorrect  identity  than  to  track  the  mid-point 
between  the  targets,  hence  it  is  desirable  to  force  the  system  to  choose  one  joint 
hypothesis  or  the  other  —  irrespective  of  whether  the  correct  hypothesis  is  selected, 
the  result  will  be  preferable  over  allowing  the  tracks  to  coalesce.  This  is  effectively 
what  is  done  by  the  JPDA*  and  CPDA*  algorithms3]  presented  in  [9j  [HI  02]:  for 
each  proposed  set  of  detected  targets  and  target-originated  measurements,  only  the 
best  association  event  is  maintained,  hence  avoiding  the  situation  described  above. 

The  problem  of  track  coalescence  can  be  illustrated  by  considering  the  position 
of  two  targets  in  joint  state  space,  as  per  the  example  discussed  in  Section [2.5.81  The 


3 As  proposed  in  [121248],  the  notation  is  short  for  “track-coalescence-avoiding”. 
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Target  2  Position 

Figure  3.1.  Pairs  of  equally  valid  tracking  solutions  in  joint  target 
state  space. 


difficulty  of  the  tracking  problem  is  that  the  measurements  arc  unlabelled.  Accord¬ 
ingly,  as  far  as  the  radar  system  is  concerned,  there  are  two  equally  valid  tracking 
solutions:  the  correct  solution,  and  the  same  situation  with  the  identity  of  the  two 
targets  exchanged.  These  solutions  are  illustrated  in  the  depiction  of  joint  target 
state  space  shown  in  Figure  3.1:  for  each  point  in  joint  target  state  space  (such  as 
the  sample  points  shown  by  the  reflection  in  the  45°  line  (mapping  the  sample 
points  to  the  locations  shown  by  ‘o’)  represents  an  equally  valid  tracking  solution, 
with  the  identity  of  the  two  targets  exchanged. 

When  the  estimated  target  position  is  far  from  the  45°  line,  the  presence  of 
this  alternative  tracking  solution  does  not  affect  the  performance  of  the  algorithm, 
because  the  alternative  solution  will  be  weighted  with  a  very  low  probability^  If 
the  targets  move  close  together,  then  the  joint  state  moves  close  to  the  45°  line, 
which  brings  the  two  tracking  solutions  close  together,  increasing  the  probability  of 
the  alternative  solution.  As  this  probability  increases,  the  weighted  mean  estimate 
is  drawn  increasingly  towards  the  centroid  of  the  two  tracking  solutions,  resulting 


4The  weight  applied  to  the  alternative  solution  will  be  zero  if  the  solution  does  not  satisfy  the 
gating  equations. 
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Target  2  Position 

Figure  3.2.  Coalesced  joint  target  state  estimate  and  covariance  using 
JPDA  algorithm. 

in  a  single  coalesced  track  between  the  two  possibilities.  When  the  targets  begin 
to  separate  after  being  close  together  (as  in  the  case  of  two  crossing  tracks),  the 
tracking  algorithm  will  attempt,  as  far  as  the  PDF  representation  is  able,  to  fit  a 
single  Gaussian  component  to  the  two  hypotheses.  If  the  two  hypotheses  are  equally 
likely,  then  the  resulting  estimate  will  be  between  the  two  tracking  solutions,  with 
a  covariance  that  attempts  to  encompass  both  identity  possibilities.  Such  a  case  is 
illustrated  in  Figure  3.21  this  is  a  typical  example  of  coalescence  using  the  JPDA 
algorithm. 

As  discussed  in  Section  12.5.81  CPDA  introduces  correlation  between  targets  in 
an  effort  to  improve  performance  when  tracking  closely  spaced  targets,  but  the  in¬ 
clusion  of  correlation  actually  causes  track  coalescence  to  worsen.  This  phenomenon 
is  very  difficult  to  explain.  Blom  and  Bloern  JT21254]  suggest  that  it  is  caused  by  the 
strong  correlation  which  develops  between  targets,  making  the  algorithm  prefer  to 
keep  estimates  between  competing  measurements.  However,  as  discussed  in  Section 
12.5.81  the  correlation  which  develops  is  inevitably  negative  correlation,  which  would 
tend  to  prefer  to  separate  the  targets  (as  compared  to  JPDA),  rather  than  keeping 
them  together. 
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Considering  the  discussion  above,  the  conclusion  of  this  study  is  similar  to  that 
of  Blorn  and  Bloern:  the  reduced  coalescence  performance  of  the  CPDA  algorithm 
is  caused  by  correlation  arising  from  the  two  tracking  hypotheses  with  exchanged 
target  identity.  However,  it  is  not  that  the  correlation  makes  the  tracker  attempt  to 
keep  the  targets  together  that  causes  the  difficulty,  but  rather  that  correlation  allows 
the  system  to  keep  both  tracking  hypotheses  within  its  field  of  view.  The  correlation 
between  the  target  state  vectors  operates  like  blinders  on  a  horse,  concentrating  the 
field  of  view  of  the  algorithm  on  the  area  containing  the  two  primary  association 
possibilities,  and  excluding  the  distraction  caused  by  other  less  likely  association 
hypotheses,  ft  is  this  very  distraction  that  rescues  JPDA  from  coalescence:  a  com¬ 
paratively  lower  weighting  will  be  applied  to  the  two  major  modalities,  hence  other 
association  hypotheses  will  tend  to  enter,  and  the  estimate  will  be  pulled  away  from 
the  45°  line,  resolving  the  coalescence. 

The  two  PDFs  in  Figure  [313]  illustrate  an  example  in  which  allowing  correlation 
between  targets  results  in  a  high  amount  of  correlation,  with  the  correlation  coeffi¬ 
cient  at  —0.9,  as  shown  in  Figure  13.3(b).  Figure  [513(a)  shows  the  same  region  of  the 
same  PDF,  where  the  correlation  between  targets  has  been  discarded.  Both  PDFs 
are  clipped  to  a  maximum  value  of  0.005  to  illustrate  the  relative  size  and  shape 
of  equally  likely  contours  of  the  joint  target  state.  As  can  be  seen,  the  skewness 
produced  by  the  correlation  prolongs  the  shape  of  the  function  towards  the  two  valid 
tracking  possibilities,  hence  ensuring  that  both  hypotheses  remain  relatively  likely 
as  compared  to  other  possible  association  events. 

Figure  S3]  is  a  snapshot  of  the  joint  target  position  in  a  Monte  Carlo  simula¬ 
tion.  The  scenario  is  a  two-dimensional  tracking  problem  with  two  slowly-crossing 
targets,  crossing  in  the  y  axis.  The  dynamics  noise  was  set  close  to  zero  in  the  x 
axis,  essentially  making  the  problem  one-dimensional.  The  diagram  shows  the  joint 
position  of  the  two  targets  after  coalescence  has  occurred  (both  position  and  veloc¬ 
ity  are  estimated).  The  ‘x’  marks  indicate  the  association  hypotheses,  which  are 
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Uncorrelated 


Correlated 


Figure  3.3.  Joint  target  state  PDF,  (a)  disallowing  correlation  be¬ 
tween  targets,  and  (b)  allowing  correlation  between  tar¬ 
gets  (correlation  coefficient  =  —0.9). 

combined  into  the  single  estimate  marked  by  ‘o’.  The  covariance  of  the  combined 
estimate  with  correlation  is  illustrated  through  the  error  ellipse,  demonstrating  the 
high  degree  of  negative  correlation  between  targets  which  results  from  combining 
the  hypotheses.  If  correlation  between  targets  was  not  admitted,  then  this  ellipse 
would  be  roughly  circular ,  and  the  probability  of  the  association  hypotheses  would 
be  drawn  towards  the  central  estimates,  before  being  dragged  off  by  one  primary 
hypothesis  or  the  other.  The  correlation  allows  the  two  primary  hypotheses,  which 
are  identical  other  than  a  switch  in  tracks,  to  remain  probable,  drawing  probability 
away  from  incorrect  hypotheses,  which  would  resolve  the  deadlock. 

3.3  Gaussian  Mixture  Reduction 

As  discussed  in  Section  12.51  the  common  methods  utilized  in  modern  tar¬ 
get  tracking  techniques  apply  different  simplifications  to  the  PDF  of  target  state 
given  the  set  of  received  measurements.  Techniques  such  as  JPDA  [5)310-319]  and 
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GNN  [71338-342]  perform  a  vast  simplification,  reducing  the  entire  Gaussian  mixture 
to  a  single  Gaussian  component,  maintaining  independence  between  targets.  Tech¬ 
niques  including  JPDAC  [4)328-329],  CPDA  [12]  and  Maximum  Likelihood  meth¬ 
ods  [25]  also  reduce  the  Gaussian  mixture  PDF  to  a  single  component,  but  by  al¬ 
lowing  correlation  between  the  target  states,  information  about  target  relationships 
is  maintained. 

While  Salmond’s  mixture  reduction  algorithms  [44H4T]  and  Pao’s  multiple  tar¬ 
get  extension  [38]  are  able  to  retain  any  number  of  Gaussian  mixture  components, 
as  discussed  in  Section  r2.5.11.2[  the  marginalization  of  target  PDFs  results  in  loss  of 
all  information  concerning  the  relationships  between  targets,  forcing  independence 
between  targets.  The  common  MHT  implementations  permit  this  dependence  by 
maintaining  hypothesis  compatibility  listings,  but  the  ad  hoc  simplification  methods 
employed  to  merge  and  prune  hypotheses  (such  as  those  described  in  [6}  292-294]) 
potentially  limit  the  usefulness  of  the  retained  mixture  components. 

Given  the  extreme  rate  of  growth  of  hypotheses  that  results  from  the  MHT 
algorithm,  some  form  of  hypothesis  control  is  unquestionably  necessary.  The  ideal 
implementation  would  be  to  maintain  the  set  of  hypotheses  which  is  small  enough  to 
be  readily  computable  by  the  system  in  question,  yet  carries  the  information  about 
the  original  target  PDF  to  the  highest  fidelity  possible. 

To  proceed,  we  first  define  the  original  joint  PDF  of  target  state,  containing 
Nh(k )  joint  hypotheses  as  to  the  possible  locations  of  the  targets,  as: 


f{x(k)\nNh(k)} 


where  f ^Nh(k)  represents  the  parameters  of  the  Nh(k )  hypotheses  derived  from  the 
measurements  up  to  the  current  sample  period.  Our  goal  is  thus  to  reduce  these 


Nh(k)  hypotheses  to  a  simplified  representation,  containing  Nr(k )  hypotheses 


re- 


5The  subscript  ‘r’  denoting  ‘reduced’. 
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suiting  in  the  simplified  PDF: 


f{x{k)\nNr{k)} 

where  Cl]yr(k)  represents  the  reduced  set  of  parameters,  containing,  as  closely  as 
possible,  the  same  information  as  the  original  set  Q^h(k). 

3.3.1  Cost  Measures.  In  order  to  simplify  the  PDF  while  making  the 
smallest  possible  overall  change,  the  first  step  is  to  select  a  scalar  cost  function 
which  measures  the  difference  between  two  PDFs  in  order  to  evaluate  whether  one 
PDF  approximation  is  “better”  than  another.  Once  such  a  function  has  been  defined, 
a  wide  variety  of  well-understood  optimization  methods  can  be  applied  to  determine 
the  parameters  of  the  reduced  PDF  which  minimize  the  cost  (and  hence  loss  of 
fidelity)  caused  by  the  reduction. 

One  can  conceive  of  any  number  of  constructs  which  would  serve  as  an  effective 
cost  function.  Two  which  have  been  previously  proposed  were  discussed  in  Section 
2.5.11.11  the  Kolmogorov  variational  distance  and  the  Bhattacharyya  coefficient. 

3. 3. 1.1  Bhattacharyya  Distance.  The  Bhattacharyya  coefficient  and 
the  closely  related  Bhattacharyya  distance  are  a  measure  of  similarity  of  two  PDFs, 
made  popular  by  [23] .  The  definition  of  the  Bhattacharyya  coefficient  is  as  given  by 
Eq.  (235): 

JB  =  f  jf{X(k)\SlNh(k)}}{X(k)\SiNr(k)}AX(k) 

while  the  Bhattacharyya  distance  is  given  by  Bd  =  —  In  Jb- 

The  application  in  [23]  is  the  maximization  of  the  distance  between  the  two 
distributions,  such  as  that  in  the  communications  problem  of  determining  whether 
the  transmitted  bit  was  a  ‘0’  or  a  ‘1’.  Comparing  the  expected  distribution  of  the 
received  signal  when  a  ‘0’  is  transmitted  to  that  when  a  ‘P  is  transmitted  results 
in  information  closely  related  to  the  probability  of  error  for  the  system.  If  this 
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comparison  is  performed  utilizing  a  good  distance  measure,  then  optimization  of 
the  system  to  maximize  the  distance  between  the  two  distributions  will  effectively 
minimize  the  probability  of  error. 

Computation  of  the  Bhattacharyya  distance  is  an  easy  matter  for  the  case  of 
two  single  Gaussian  PDFs  as  the  measure  takes  the  product  of  the  two  PDFs,  which 
results  directly  in  another  Gaussian  (with  scaled  volume)  after  completing  the  square 
(similarly  to  the  development  in  Appendix  lA.ll).  As  all  terms  are  multiplicative, 
the  square  root  can  be  taken  of  each  term  individually,  thus  the  result  is  indeed  a 
Gaussian  form. 

In  the  case  of  Gaussian  mixtures,  the  product  of  the  two  PDFs  results  in  a 
sum  with  a  term  for  each  pairing  of  mixture  components  from  the  two  PDFs,  as 
per  the  cost  function  component  Jhr,  expanded  in  Eq.  (13.22) .  The  square  root  of 
this  expression  will  not  be  able  to  be  simplified  in  general,  and  intractable  numerical 
integration  methods  will  be  necessary. 

3.3. 1.2  Kolmogorov  Variational  Distance.  The  Kolmogorov  varia¬ 
tional  distance,  defined  in  Eq.  (12.74).  is  the  integral  of  the  absolute  difference  between 
the  two  probability  density  functions: 

JK=  f  \f{X(k)\VlNh(k)}  -  f{X(k)\CiNr(k)}\ dX(k) 

The  measure  provides  an  indication  of  the  amount  of  probability  mass  by  which 
the  two  PDFs  differ.  If  the  two  functions  are  identical  throughout  the  probability 
space,  then  the  cost  will  be  zero.  Conversely,  if  the  two  functions  are  entirely  disjoint, 
then  the  difference  will  merely  be  the  sum  of  the  integral  of  each  PDF  individually 
(each  evaluating  to  unity). 

Like  the  Bhattacharyya  distance,  the  Kolmogorov  variational  distance  is  not 
easily  evaluated.  The  absolute  value  function  requires  piecewise  definition,  thus  the 
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integral  must  be  divided  into  the  two  portions:  where  the  original  PDF  is  larger 
in  value  than  the  approximation,  and  where  the  approximation  is  larger  in  value 
than  the  original  PDF.  While  the  integral  over  an  entire  Gaussian  function  is  easily 
evaluated  (indeed  it  is  unity  by  definition),  the  integral  over  an  arbitrary  portion 
of  a  Gaussian  function  is  extremely  difficult  to  evaluate,  and  resorting  to  numerical 
techniques  or  even  Monte  Carlo  methods  will  be  inevitable. 

3.3. 1.3  Integral  Square  Difference  Measure.  The  use  of  the  absolute 
function  in  the  Kolmogorov  variational  distance  provides  an  even  nonlinearity,  to 
force  positive  cost  and  negative  cost  to  be  handled  identically.  Another  nonlinearity 
which  could  be  used  in  place  of  the  absolute  value  is  a  square  function,  resulting  in 
the  following  modified  cost: 

Js  =  J  (/{JV(fc)|r2WK(fc)}  -  /{JV(fc)|f2AV(fc)})2dJC(A:)  (3.9) 

where  the  subscript  LS ’  is  used  to  denote  the  square  nonlinearity.  This  measure  will 
be  referred  to  as  the  Integral  Square  Difference  (ISD)  cost  function.  The  nonlinearity 
could  be  replaced  with  any  even  integer  power,  where  higher  powers  will  tend  to  treat 
areas  of  larger  error  with  increasingly  higher  weight  than  those  of  lower  error.  In 
the  limit,  as  the  power  approaches  infinity  (in  an  even  sense),  the  cost  function 
will  apply  all  priority  to  the  largest  error  point,  tending  to  minimize  the  maximum 
error  committed  by  the  approximation.  This  is  illustrated  in  Figure  3.51  which 
compares  the  absolute  value  function  to  other  even  nonlinear  functions,  x 2,  x4  and 
x6.  To  interpret  the  diagram,  consider  the  case  in  which  the  maximum  error  of  the 
approximation  is  unity,  which  each  function  maps  to  the  same  value.  The  difference 
between  the  various  nonlinearities  is  then  the  amount  of  weight  applied  to  points 
with  comparatively  lower  error:  the  absolute  value  function  applies  weight  which 
reduces  linearly  to  lower  errors,  while  x2,  x4  and  x6  apply  weights  which  decrease  at 
a  faster  rate  as  the  order  of  the  nonlinearity  increases. 
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Even  Nonlinearities 


Figure  3.5.  Comparison  of  various  even  nonlinearities. 

The  implication  of  Figure  [5151  to  the  ISD  cost  function  is  that  the  ISD  measure 
will  behave  very  similarly  to  the  Kolmogorov  variational  distance,  but  compara¬ 
tively  higher  weight  will  be  applied  to  areas  of  larger  error,  while  comparatively 
lower  weight  will  be  applied  to  areas  with  lower  error.  If  the  original  PDF  contains 
components  with  very  small  variances  producing  large  peaks  alongside  components  of 
similar  probability  weight  with  larger  variances  producing  smaller  and  flatter  peaks, 
then  one  can  expect  that  the  square  nonlinearity  will  give  higher  cost  to  the  lower 
variance  components  (with  higher  peaks)  rather  than  the  higher  variance  compo¬ 
nents  (with  lower  peaks),  where  the  Kolmogorov  variational  distance  would  treat 
the  two  identically. 

3. 3. 1-4  Maximum  Likelihood  Measure.  The  field  of  Maximum  Likeli¬ 
hood  estimation  is  based  upon  finding  the  parameters  of  a  known  distribution  form 
that  maximize  the  likelihood  of  receiving  a  set  of  data,  assuming  that  it  was  drawn 
from  the  given  form  of  distribution.  If  a  single  datum  vector  z  is  received,  then  the 
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most  likely  value  of  the  parameter  0  can  be  found  from  |48t210]: 

0  =  argmax  f{z\9}  (3.10) 

0 

It  is  often  more  convenient  to  perform  this  optimization  in  terms  of  the  natural 
logarithm  of  the  density  rather  than  the  density  itself.  Since  log  x  is  a  monotonically 
increasing  function  for  x  >  0,  the  peak  of  the  logarithm  of  the  density  occurs  at 
the  same  location  as  the  peak  of  the  density  itself.  The  logarithm  of  the  density  is 
commonly  referred  to  as  the  log-likelihood  function: 

L(0,z)  =  log  f{z\0}  (3.11) 

Assuming  that  a  set  of  data  vectors  Z  =  {z\  . . .  zn }  were  drawn  from  the 
density  of  interest  such  that  they  are  independent  and  identically  distributed,  the 
joint  density  of  the  data  may  be  written  as: 

n 

/w=n /w*}  (3i2) 

i= 1 

such  that  the  most  likely  value  for  the  parameter  vector  0  may  be  found  by: 

0  =  arg  max  f{Z\0} 

n 

=  arg  max  n  f{zi \0}  (3.13) 

0  ,_i 
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This  expression  is  conveniently  simplified  using  the  logarithm  operation  as 


discussed  above: 


9 


arg  max  log  /  {  Z  \  9  } 

9 

n 

arg  max  log  n/w®} 

e  i.i 

n 

arg  max  ^\og  f{zi\G} 

0  ; — i 


arg  max 

#  i.i 


(3.14) 


The  expression  of  Eq.  (13.141)  is  used  to  derive  the  Expectation  Maximization 
(EM)  algorithm  for  determining  the  parameters  of  the  Gaussian  mixture  which  best 
match  a  given  set  of  data  [4T] .  As  the  data  sample  increases  in  size,  it  provides  a 
more  detailed  representation  of  the  true  data  PDF,  resulting  in  a  set  of  parameters 
which  better  represents  the  distribution.  In  the  limit  as  the  sample  size  approaches 
infinity,  the  data  sample  converges  to  represent  the  true  PDF: 

n 

6  =  arg  max  lim  >  log  f{zi\9} 

A  71— XX)  '  J 

u  i= 1 

=  arg  max  /  f{z}  log  f{z\6}dz  (3.15) 

0  J 


The  above  derivation,  resulting  in  Eq.  (3.151),  provides  a  means  for  finding  the 
parameters  6  of  the  density  form  f{z\9}  which  best  match  the  true  data  density 
f{z}.  This  function  is  synonymous  to  that  required  for  this  study:  we  want  to  solve 
for  the  parameters  of  a  Gaussian  mixture  which  provide  the  best  fit  to  a  mixture  of 
higher  complexity.  Hence  a  natural  measure  of  the  fit  of  the  reduced  density  using 
parameters  ClNr(k)  to  the  higher-order  density  represented  by  parameters  fl]yh(k)  is 
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provided  by  the  cost  function  defined  by: 


Jml=  I  f{x(k)\nNh(k)}\ogf{x(k)\nNr(k)}dX(k)  (3.16) 

Following  from  this  interpretation,  the  author  considers  this  expression  to  be  the  ideal 
cost  function  for  the  application.  However,  the  logarithm  of  a  Gaussian  mixture  is 
not  able  to  be  simplified,  hence  the  cost  function  is  unable  to  be  evaluated  without 
numerical  integration  or  approximation. 

The  expression  of  Eq.  (13.16)  somewhat  resembles  the  divergence  as  defined 
in  [226]: 

J{  1,2)  =  /(/!{*} -M*})  logged* 

=  /(/i{,}iog/i{4^/i{4iogM,} 

-f2{x}logf1{x}  +  f2{x}logf2{x})dx  (3.17) 

The  divergence  is  a  measure  of  the  difficulty  of  discriminating  from  which 
distribution  ( fi{x }  or  f2{x})  a  sample  vector  was  drawn,  and  hence  a  measure  of  the 
similarity  between  them.  Comparing  Eq.  (13.17)  to  Eq.  (13.16).  the  divergence  consists 
of  four  terms:  the  information  content  (entropy)  [32)166]  of  each  distribution,  and 
the  “cross-entropy”  terms,  one  of  which  can  be  seen  to  be  the  Maximum  Likelihood 
cost  function,  as  defined  in  Eq.  (13.16). 

3.3.2  Analysis  of  Integral  Square  Difference  Measure.  After  analyzing  the 
various  cost  function  options,  the  ISD  distance  measure  is  the  only  option  which  leads 
to  a  cost  function  that  can  be  evaluated  in  closed  form,  without  requiring  expensive 
numerical  integration.  Furthermore,  as  will  be  seen  in  Section  3.3.31  the  derivatives 
of  the  cost  function  with  respect  to  each  of  the  parameters  can  also  be  evaluated 
in  closed  form,  allowing  iterative  optimization  techniques  to  be  employed  efficiently. 
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On  an  intuitive  level,  the  cost  function  does  not  have  the  appealing  probability 
mass  interpretation  of  the  Kolmogorov  variational  distance,  or  the  optimal  reduced 
parameter  fit  interpretation  of  the  Maximum  Likelihood  measure;  however,  it  is  a 
reasonable  measure  of  the  distance  between  the  two  PDFs.  The  function  reaches  its 
lowest  possible  value  of  zero  when  the  two  PDFs  are  identical  throughout  the  space]6 
and  its  maximum  possible  value  when  the  two  PDFs  are  completely  disjoint  (i.e., 
when  the  product  of  the  two  PDFs  is  zero  at  every  point  in  the  space). 

The  ISD  distance  can  be  seen  to  be  very  similar  in  form  to  the  Kolmogorov 
variational  distance,  with  the  only  disparity  being  the  squaring  of  the  integrand  in 
the  former.  Considering  the  impact  of  this  squaring,  it  will  tend  to  treat  regions  in 
which  the  difference  between  the  PDFs  is  smaller  with  lower  weight  than  the  absolute 
difference  method,  and  regions  in  which  the  difference  is  larger  with  higher  weight. 
Accordingly,  we  can  expect  that  the  result  obtained  using  the  ISD  method  will  be 
more  averse  to  committing  larger  errors,  and  less  averse  to  committing  smaller  value 
errors,  even  if  the  volume  contained  in  the  errors  is  the  same.  Considering  this  in  the 
context  of  a  Gaussian  mixture  PDF,  we  expect  that  the  ISD  measure  will  give  more 
consideration  to  mixture  components  with  lower  variance  (and  thus  higher  value) 
over  mixture  components  with  higher  variance,  even  if  the  probability  weights  are 
identical. 

Expanding  the  ISD  distance  measure  equation  yields  the  following  terms: 

Js  =  J  f{x(k)\nNh(k)}2  + 

-  2f{x(k)\nNh(k)}f{x(k)\nNr(k)}  + 

+  f{X{k)\nNr(k)Y&X{k)  (3.18) 

The  three  terms  of  Eq.  (3.18)  each  have  their  own  interpretation.  The  first  represents 
the  self-likeness  of  the  original  PDF  —  this  term  will  be  larger  if  the  PDF  is  more 

6Except  at  points  of  zero  probability  mass. 
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concentrated  in  space,  and  smaller  if  the  PDF  is  more  spread  out.  The  second 
represents  the  cross-likeness  of  the  original  PDF  to  the  new  PDF.  This  term  is 


critical  to  the  function  as  it  directly  measures  the  volume  of  probability  mash7]  that 


the  two  functions  have  in  common.  The  final  term  is  the  self-likeness  of  the  reduced 
PDF,  possessing  similar  characteristics  to  the  other  self-likeness  term.  The  cross¬ 
likeness  term  serves  to  balance  the  two  self-likeness  terms,  cancelling  the  overall  cost 
function  value  to  zero  if  the  two  functions  are  identical,  and  increasing  the  overall 
cost  function  value  as  the  difference  between  the  functions  increases. 

Defining  these  three  components  as: 


/ 


f{X(k)inxjk)}f{x(k:mNr(k)}dX(k) 


(3.19) 


we  can  then  write  Eq.  (13.181)  as: 


(3.20) 


In  the  problem  of  interest,  the  two  PDFs  are  both  Gaussian  mixtures,  which 
can  be  expanded  as: 


Nh(k) 


f{X(k)\nNh(k)}  =  V  P,A'{X  ://,.!>,} 


%—  1 


Nr(k) 


(3.21) 


1=1 


1  Although  it  is  in  square  units  rather  than  the  units  typically  associated  with  probability  mass 


measure. 
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where  {pi,  Hi,  P*}  are  the  weights,  means  and  covariances  of  the  Gaussian  functions 
composing  the  mixture  for  original  PDF,  and  {pi,  Hi,  P, }  are  the  same  parameters 
of  the  reduced  PDF.  Substituting  these  expressions  into  Eq.  (13.19]): 


Jhr 


Jhh  ~ 


,Nh(k)Nr(k) 

EE  PlAf{X- P i}PiU{X-  Pj}d X(k) 

2=1  j=  1 

,  Nr(k)  Nr(k) 

Y  Y  P^iX-  Pl}PJX{X:  HjP  P,}cl X(k) 

2=1  1=1 
,Nh(k)Nh(k) 

E  E  Hi,  Pi}PjN{X-,  Hv  Pi)d X(k) 

2=1  j  =  l 


(3.22) 


By  linearity  of  the  integration  operation,  the  summation  and  integration  of  each  of 
the  expressions  in  Eq.  (13.22)  can  be  reversed,  resulting  in: 


Nh(k)Nr(k) 


J hr 


Jhh  — 


E  E  PiPj  I  N’{X;Hi,'PiW{X-,iij,Pj}dX{k) 

2=1  j  =  l  ■’ 

Wr(k)Nr(k) 

Y  Efth  UiX-H^P^MiX-Hj^MX^k) 

2=1  1  =  1  J 

Mh(k)Nh{k) 

E  E  PiPj  /  MiX-H^iWiX-Hj^MX^k) 
2=1  7  =  1  ^ 


(3.23) 


Following  from  the  derivation  of  Appendix  OLlf  the  product  of  two  Gaussian 
PDFs,  which  forms  the  basic  building  block  of  Eq.  (13.23).  can  be  simplified  to  the 
following  form: 


JJ{x',  Hi,  P i}A/’{x;  /x2,  P2}  =  aJ\f{x;  /x3.  P3}  (3.24) 
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where  a,  /x3  and  P3  are  as  given  by  Eq.  (1  A.  13[) : 


a  =  A/'{/x1;/x2,P1 +  P2} 

P3  =  (Pr1  +  P2”1)"1 
M3  =  P3(Pl  V1  +  P2  V2) 

Substituting  this  simplification  into  the  expressions  of  Eq.  (13.23)  and  noticing  that 
the  integral  over  a  Gaussian  PDF  evaluates  to  unity,  we  find: 

Nh(k)Nr(k) 

Jhr  =  +  Pj}  /  •A/'{X;/ia,Pa}dX(A;) 

i= 1  j= 1  J 

Nh(k)  Nr{k) 

=  V  ^  p;/p.\'{ P,  ••  P, } 

i= 1  j=l 

ATr(fc)  iVr(fc) 

■4r  =  X]  S  Mj^{MdMpPi  +  Pj}  /  A/'{X;//6.Pb}dX(A;) 

i=l  i=i  ^ 

jVr(fc)  ATr(fe) 

=  X]  5Z  Mi ;  Mp  P*  +  Pj  } 

i=l  i=l 

Nh{k)Nh(k) 

=  E  E  /i,/pA'{A/,:,/,;.P,~  P,}  /  A{X;Mc,Pc}dX(fc) 

i=l  j=l  ^ 

Nh(k)Nh(k) 

=  Y1  I’,  -  P, }  (3.25) 

i=l  i=l 

where  /Lta,  /xb,  /x(:.  Pa,  Pb  and  Pc,  are  the  combined  means  and  covariances  of  the 
respective  Gaussian  component  pairs  from  Eq.  (13.23).  calculated  as  according  to 
Eq.  (]3.24).  Interpreting  Eqs.  (13.20)  and  (13.25).  the  cost  function  consists  of  the 
sum  of  similarity  measures  of  all  pairs  of  two  components  from  the  original  mixture, 
plus  similarity  measures  of  all  pairs  of  two  components  from  the  reduced  mixture, 
balanced  by  the  sum  of  similarity  measures  of  all  pairs  of  one  component  from  the 
original  mixture  and  one  component  from  the  reduced  mixture. 
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3.3.2. 1  Normalization.  While  the  ISD  cost  function  will  have  a  min¬ 
imum  value  of  zero  (corresponding  to  the  case  in  which  the  PDFs  are  identical),  the 
peak  value  of  the  cost  (corresponding  to  the  case  in  which  the  PDFs  are  essentially 
disjoint)  will  vary  depending  on  the  PDFs  under  consideration.  If  it  is  desirable  for 
the  cost  function  to  be  bounded,  it  could  be  normalized  using  an  expression  such  as: 

=  f  |iW<0}  ~  /{xwiavr(<p})2dxw 
s  //{x(fc)|n^(fc)}2  +  /{x(fc)|nIV,w}MX(*;) 

_  „  J/{x(u|nWt(fc)}/{x(<:)|nw,W}dxw 

//{x(uiswU}2  +  /{xmiUv.(U}2dX(fc)  l'  1 

which  will  result  in  a  function  which  is  bounded  between  zero  and  one,  a  desirable 
characteristic  if  a  fixed  threshold  is  to  be  utilized  to  limit  the  maximum  allowable 
cost  incurred  by  the  PDF  reduction.  In  this  study,  the  reduction  was  performed  until 
the  desired  number  of  components  was  achieved,  hence  bounding  was  not  utilized. 

3.3.3  Iterative  Optimization.  The  cost  function  described  in  Section  [3.3.21 
provides  a  measure  of  the  dissimilarity  between  two  Gaussian  mixtures,  and  has  the 
following  desirable  characteristics: 

1.  The  equation  for  Jg  can  be  evaluated  completely  in  closed  form,  resulting  in 
a  sum  of  multivariate  Gaussian  functions  with  one  term  for  each  pairing  of 
components  in  the  original  and  reduced  mixtures. 

2.  The  resulting  closed  form  evaluation  is  continuously  differentiable,  hence  stan¬ 
dard  gradient-based  optimization  techniques  can  be  employed. 

3.  As  we  will  see  in  the  following  pages,  the  expressions  for  the  first  gradient  of 
the  cost  function  can  also  be  written  in  closed  form  using  standard  vector- 
matrix  notation,  again  simplifying  the  employment  of  gradient-based  iterative 
optimization  techniques. 


3-24 


Thus,  by  careful  selection  of  cost  function,  we  are  able  to  apply  the  optimiza¬ 
tion  methods  described  in  Section  12.61  to  find  the  parameters  of  the  reduced  order 
Gaussian  mixture  that  provide  the  best  fit  to  the  higher  order  function.  The  param¬ 
eters  to  be  optimized  are  (recalling  Eq.  (13.211)): 

1.  The  probability  weights  of  the  reduced  mixture,  {pi}. 

2.  The  mean  vectors  of  the  reduced  mixture,  W- 

3.  The  covariance  matrices  of  the  reduced  mixture,  {P*}- 

In  order  to  produce  a  valid  Gaussian  mixture  as  an  output,  there  are  three 
constraints  for  the  optimization: 

1.  The  probability  weights  must  be  non-negative:  p?;  >  0  V  i. 

2.  The  probability  weights  must  sum  to  unity:  YhiPi  =  1- 

3.  The  covariance  matrices  must  be  positive-definite:  xTP iX  >  0  V  i,  x. 

Such  constraints  complicate  the  optimization  greatly,  requiring  the  addition  of 
parameters  such  as  Lagrange  multipliers  [361152-153].  If  the  optimization  is  re-posed 
in  terms  of  transformed  parameters  such  that  these  constraints  are  guaranteed  to 
be  satisfied,  this  complexity  is  completely  avoided,  and  much  simpler  unconstrained 
optimization  methods  can  be  employed.  One  set  of  transformations  which  produces 
this  result  is: 


-  _  ft 

Pi  2 

p  <  =  L?:LjT  (3.27) 

The  first  line  in  Eq.  (13.27)  replaces  the  probability  weights  {pi}  with  a  the 
set  of  functions  of  {qi},  which,  for  the  optimization  performed  in  terms  of  these 
transformed  variables,  guarantees  that  the  resultant  {pi}  set  will  be  non- negative 
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and  sum  to  unity.  The  second  line  of  Eq.  (13.271)  replaces  each  covariance  matrix 
with  a  matrix  square,  a  form  which  is  guaranteed  to  produce  a  positive  semi-definite 
matrix  [521333] .  To  ensure  positive  definiteness  (i.e.,  no  zero  eigenvalues),  we  rely  on 
the  premise  that,  if  the  original  mixture  to  which  we  are  fitting  the  reduced-order 
density  has  no  singular  covariance  components,  then  the  optimization  is  unlikely 
to  pull  towards  such  a  solution.  In  other  words,  the  positive  definiteness  of  the 
solution  is  not  strictly  guaranteed,  but  this  is  unlikely  to  cause  difficulties  in  any 
physically-motivated  problem. 

To  commence  the  iteration,  the  transformed  parameters  can  be  determined 
from  the  initial  values  of  the  reduced  parameters  by  the  following  relationships: 

Qi  =  VWi 

U  =  Wi  (3.28) 

where  \/P \  denotes  the  Cholesky  square  root,  as  defined  in  [341370]. 

After  some  experimentation,  it  was  decided  that  the  constraint  for  the  prob¬ 
ability  weights  to  sum  to  unity  was  unnecessary.  To  understand  this,  consider  the 
problem  in  which  a  density  that  has  several  peaks,  all  of  which  are  separated  from 
each  other,  is  to  be  reduced  to  such  an  extent  that  not  all  of  these  peaks  can  be 
modelled.  If  the  probability  weights  are  constrained  such  that  they  must  sum  to 
unity,  then  the  weights  of  the  remainder  of  the  components  must  be  increased  to 
take  up  the  weight  of  the  components  which  are  no  longer  being  modelled,  and  this 
increase  in  the  remaining  weights  will  create  additional  error  in  the  area  of  the  re¬ 
maining  components.  Thus,  if  we  want  the  only  cost  associated  with  the  deletion 
of  a  component  to  be  the  error  induced  in  the  region  of  that  component,  then  the 
other  components  weights  should  not  be  forced  to  be  adjusted.  Accordingly,  during 
the  optimization  process,  we  allow  the  weights  be  de-normalized,  drifting  to  what¬ 
ever  total  value  (virtually  always  close  to  unity)  provides  the  best  fit  to  the  original 
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function.  Normalization  is  then  applied  as  a  final  step  after  the  optimization  process 
has  been  completed. 

Following  this  discussion,  the  transformation  of  the  probability  weights  of 
Eq.  (J3.27j)  was  changed  to: 

Pi  =  q% 2  (3.29) 

and  the  initial  value  of  the  {g;}  parameters  remains  as  per  Eq.  (13.28).  The  cost 
function  must  then  be  re-written  in  terms  of  the  transformed  parameters,  resulting 
in  the  following  expression: 


Js  =  J, 


hh 


2  J fyy  |  J 


where: 

Nh(k)Nr(k) 

Jhr  =  ^  ^  ^  ^  Hji  Pj  +  hjhj  } 

i=  1  j= 1 

Nr(k)  Nr(k) 

Jrr  —  y>  ^  y>  ^  q-  q^Jf  [Ay,  L?;Lj  +  LjLj  } 

4=1  j  =  1 

Nh(k)Nh(k) 

Jhh  =  d'7hA'{/f(://,;.  P,  -  \\,}  (3.30) 

*= i  j= i 

In  the  following  sections,  the  derivatives  of  these  components,  Jhr ,  Jrr  and  Jhh, 
are  calculated  separately  with  respect  to  each  parameter.  The  Jhh  equation  contains 
only  components  from  the  original  mixture,  hence  the  derivative  of  it  with  respect 
to  parameters  of  the  reduced  mixture  evaluates  to  zero. 

3.3.3. 1  Derivatives  with  respect  to  weights.  The  expressions  in 
Eq.  (13.31)  below  show  the  partial  derivatives  of  the  cost  function  components  Jhr 
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and  Jrr  with  respect  to  the  probability  weights  {qj}'- 


d  J hr 

dqj 


dJrr 

dqj 


Nh(k) 

/  ^  Pi  ■  2(/yA/"{ /J-j]  i-i j,  Pj  +  L j Lj  } 

i= 1 

Nh(k) 

2 (lj  Y  PiN{pp  Mr  Pi  +  LjL /} 

2—1 

ATr(A:) 

2Yqi'  -'//A’{/X,:  hi-  L  L  1  +  L.?L?T} 

i=l 

+  4g^A/" {/ij-;  /i,  ■  LjLjT  +  LjLjT} 

Nr(k) 

%  ^  '/,2A'{/i,:  fij-  LiLiT  +  LjL/} 

2=1 


(3.31) 


3. 3. 3. 2  Derivatives  with  respect  to  means.  The  derivative  of  a  scalar 
with  respect  to  a  vector  is  a  vector  in  which  each  component  is  equal  to  the  derivative 
of  the  scalar  with  respect  to  the  corresponding  component  of  the  vector: 

ran  _  oj 
<9{/x}i 

While  the  usual  convention  |34t23]  is  that  the  derivative  of  a  scalar  with  respect  to  a 
vector  produces  a  row  vector,  the  following  development  chooses  for  convenience  to 
define  it  as  the  column  vector  (the  transpose  of  the  conventional  result).  By  applying 
Eq.  (13.321).  one  can  derive  relationships  which  allow  calculation  of  the  derivative  of 
common  scalar-vector  functions  in  terms  of  standard  vector  notation.  One  of  the 
most  common  examples  of  this  is  the  vector  quadratic  product: 


d_ 

<9/x 


j/x7  A/x}  =  2A/x 


(3.33) 
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where  A  is  assumed  symmetric.  Using  Eq.  (13.33]).  we  can  arrive  at  the  expressions: 


9J hr 


dJrr 

d»3 


Nh(k) 

Y  Pitf  ■  -(p*  +  L./L./7  )  1  ( /•'./  -  PiW{t*i,  Pj,  P i  +  LjL /} 

i— 1 

Nh(k) 

-q]  Y  Pi(Pi  +  L/L  ')  V,  -  PiWiPi, Pj, p*  +  LjL/} 

2=1 

Nr(k) 

2  '  -(LiLiT  +  L7L/)"1(/ij  -  A^lAii  Pj,  LiLiT  +  L3L3T} 

i=  1 

Nr(k) 

— 2g|  ^  ^2(LjLjT  +  LjLjT)  1(/xJ-  —  /U)A/"{/U;  fij,  LjL2;T  +  L?L?r} 

2=1 

(3.34) 


where  the  i  =  j  term  is  included  in  the  final  equality  for  convenience,  noting  that  it 
will  evaluate  to  zero  anyway. 


3. 3. 3. 3  Derivatives  with  respect  to  covariances.  In  a  manner  similar 
to  the  derivative  of  a  scalar  with  respect  to  a  vector,  the  derivative  of  a  scalar  with 
respect  to  a  matrix  is  defined  as  the  matrix  whose  (z,  j)  element  is  the  derivative  of 
the  scalar  with  respect  to  the  (z,  j)  element  of  the  matrix: 


r  an  _  oj 

l<9AJp  d{A}tJ 

Expanding  the  Gaussian  component  of  the  form  of  Jrr  terms: 


(3.35) 


|27r(LjLjT  +  LjLjT)|  a  exp  -  fj,j)T(LiLiT  +  LiLjr)  \ni-pLj)  |  (3.36) 


we  find  that  we  need  to  calculate  the  derivative  of  the  following  two  expressions: 


Leading  coefficient: 

Exponent: 


d 

dL, 


L,Lj 


t  +  L;L/| 


1 

2 


d 

&Li 


(. Pi 


^i)T(LjLjT  +  LjLjT)~1(pi 


Pj)  (3-37) 
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While  the  results  of  the  above  derivatives  are  not  obvious,  the  following  results 
from  [20)614]  and  [13]  give  the  general  form  of  the  solution: 


<9  log  |  XT  AX  | 
<9X 

<9tr[(XTAX)_1C] 

dX 


=  2AX(XtAX)_1 

=  -AX(XtAX)-1(C  +  Ct)(XtAX)-1 


(3.38) 


Defining  X  = 


1  T 


Lj  L  j 


,  A  =  I  and  C  =  (/Zj  —  /z,-)(/Zj  —  /z,-)T  such  that: 


X1  AX  = 


r  i 

UT 

1  j  i  1  j  i 

I 

1  3 

- 1 

hJ 

—  LjLj  +  L,L, 


(3.39) 


and 


tr[(XTAX)-‘C]  =  tr[(LiLjT  +  L3L/)-‘(/iI-fij)(Mi-pj)T] 

=  +  (3.40) 


we  hnd  forms  that  closely  match  our  desired  solution.  Using  the  chain  rule,  we  can 
find  the  common  expression: 


deUU 

dx 


de-f  d  f(x) 
df 

d  f(x) 


dx 


J(x) 


dx 


(3.41) 


In  order  to  utilize  this  expression,  we  define: 


/(X)  =  log  |27t(Xj  X)|~2  exp  <  —  §(/Zj  —  fi-)1  (X1  X)-  (/U  —  p,. 


—  f  log  (27t)  —  5  log  |XTX|  —  §(/ij  —  p,j)T(Xjl'X)~1{p,i  —  p,j)  (3.42) 


NT/yTyN-I/.-: 
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where  N  is  the  dimensionality  of  the  the  mixture,  such  that  the  derivative  of 
Eq.  (13.361)  is  evaluated  as: 


Jj|2ir(XTX)|U  exp  {-■  (ft  -  ft)T(XTX)  ‘(ft  -  ft)} 

=  ~\'lk  N|XTX|  +  (ft  -ft)T(XTX)-‘(ft  -ft)]  V{ft;ft,XTX} 

=  -  [X(XTX)-‘  -  X(XTX)-‘(ft  -  ft)(ft  -  mj)T(XtX)-1]  V{ft;ft,XTX} 
=  -X(XTX)->  [XTX  -  (ft  -  ft  I  (ft  -  ft)T]  (XTX)-W{ft;  ft,XTX} 

(3.43) 


Substituting  in  the  partitioned  matrix  X,  taking  the  transpose  and  keeping  only  the 
partition  in  which  we  are  interested,  we  find: 

T^j— eA/"{/x,;,  /Xj,  LjLj  +  LjLj  } 

=  A p-j,  L,L,r  +  L,L/}(L);L®  +  LjLjT)  1  • 

•  [(ft  -  PjKPi  ~  Pj)T  ~  (LjLjT  +  LjLj7')]  (L,L,t  +  LjLjT)~1Li 

(3.44) 


The  derivatives  of  Eq.  (13.30)  then  become: 


()J, 


Nh{k ) 


hr 


dL, 


dJr, 

tl) 


X!  PiQjN {Pii  Pj,  P »  +  LjLjT}(P*  +  LjL/)  1  ■ 


2=1 


•  [(Mi  -  A*i)(Mi  -  -  Pi  +  L,L/)]  (Pi  +  PjPjTr% 

Nr(k) 

2  E  9i  ^{AiJ  fij,  L iLiT  +  LjL/}(LiLiT  +  L.-L/)"1  ■ 


2=1 


[(ft  -  ft) (ft  -  ft)T  -  (L,L,t  +  LjL/)]  (Lift1,  +  Lft/r'L, 

(3.45) 


3. 3. 3.4  Verification.  The  above  results  were  calculated  by  hand  and 
coded  manually  using  MATLAB®.  To  verify  the  expressions,  the  cost  function  ex- 
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pressions  of  Eq.  (13.301)  were  entered  symbolically  and  the  derivative  was  calculated 
with  respect  to  each  parameter  using  the  MATLAB®  Symbolic  Toolbox.  The  resul¬ 
tant  expressions  were  then  subtracted  from  the  hand-coded  expressions  with  sym¬ 
bolic  variables,  and  it  was  verified  that  the  results  of  each  block  cancelled  to  zero, 
indicating  the  algebraic  equivalence  of  the  manual  calculation  to  the  computer-aided 
solution. 


3. 3. 3. 5  Newton- Raphs on  Algorithm.  As  discussed  in  Section  12.61 
the  Newton- Raphson  algorithm  operates  similarly  to  the  gradient  algorithm,  but 
converges  to  the  solution  at  a  much  faster  rate  (through  from  within  a  smaller  ball  of 
convergence).  Intuitively,  utilizing  full  second  derivative  information  is  an  extremely 
desirable  step,  as  it  incorporates  a  great  amount  of  information  about  the  interaction 
of  the  parameters  into  the  optimization  process.  It  was  the  hope  of  the  author  that, 
using  a  carefully  chosen  starting  point,  the  cost  function  would  be  well  approximated 
by  a  parabola,  and  the  Newton- Raphson  algorithm  would  reach  the  solution  in  one 
or  two  iterations.  These  hopes  were  not  realized,  however,  and  the  conclusion  was 
reached  that  the  algorithm  is  inappropriate  for  this  application. 


The  full  Hessian  matrix  was  calculated  for  the  case  of  a  scalar  Gaussian  mix¬ 
ture,  utilizing  the  constraint  transformations  described  in  Eq.  (3.27)  J8I  coded  in 
MATLAB®  and  verified  using  the  Symbolic  Toolbox  as  described  in  Section  13.3.3.41 
Simulation  results  revealed  major  difficulties  with  the  technique.  In  very  few  steps  of 
the  algorithm,  mixture  components  converged  toward  each  other,  at  which  point  the 
Hessian  matrix  became  singular  and  the  algorithm  could  not  continue.  This  reveals 
the  inappropriateness  of  the  technique  to  this  application:  there  are  simply  too  many 
actions  which  produce  an  equivalent  result  in  the  overall  function,  and  the  method 
is  attracted  to  points  producing  singularities  in  the  Hessian.  This  is  a  commonly 
understood  problem  of  the  Newton- Raphson  algorithm:  the  method  converges  to 


8i.e.,  the  probability  weights  were  constrained  to  sum  to  unity. 
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a  critical  point  |I8U01]  —  whether  that  critical  point  is  a  maximum,  minimum  or 
saddle  point  (producing  a  singular  Hessian)  depends  purely  on  the  structure  of  the 
cost  function  in  the  local  region. 

Apart  from  the  difficulties  discussed  above,  implementation  of  a  full  Newton- 
Raphson  technique  in  any  practical  situation  is  computationally  intractable.  If 
the  Gaussian  mixture  contains  10  components,  each  of  which  is  a  six-dimensional 
Gaussian  PDF,  then  the  optimization  parameters  will  include  10  weights,  10  six¬ 
dimensional  means  (60  parameters),  and  10  covariance  matrices  which,  if  represented 
in  factored  triangular  form  will  each  contain  21  parameters.  The  total  number  of 
parameters  is  thus  280,  even  for  this  small  problem,  and  each  step  will  require  a 
280  x  280  matrix  inversion. 

While  the  Newton-Raphson  algorithm  was  demonstrated  to  be  inappropriate 
for  this  application,  the  Newton-Raphson  approximations  (i.e. ,  weighted  gradient 
algorithms)  which  utilize  a  diagonal  or  block-diagonal  Hessian  matrix  (as  discussed 
Section  12.6)  could  potentially  provide  a  significant  improvement  in  the  rate  of  con¬ 
vergence  of  the  search.  For  the  purposes  of  this  study,  however,  the  gradient  tech¬ 
nique  provided  an  adequate  rate  of  convergence,  and  these  modifications  were  not 
attempted. 

3.3.4  Initialization  Algorithm.  The  cost  function  describing  the  fit  of  a 
reduced  complexity  Gaussian  mixture  to  a  Gaussian  mixture  of  higher  order  is  an 
extremely  complicated  multi-modal  function  with  many  peaks  and  troughs,  of  which 
all  but  one  represent  local  minima  rather  than  the  true  global  minimum.  Consid¬ 
ering  the  fact  that  virtually  all  gradient-based  iterative  optimization  methods  will 
converge  on  a  local  minimum,  this  reveals  that  selecting  the  initialization  point  for 
the  optimization  is  in  fact  the  most  critical  function  for  the  algorithm,  more  so  than 
the  iterative  optimization  itself. 
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One  could  conceive  of  any  number  of  algorithms  that  could  be  used  for  this 
function.  Simple  component  pruning  (keeping  the  hypotheses  with  the  Nr  highest 
weights),  as  used  in  the  standard  MHT  algorithm,  would  perhaps  be  the  easiest 
solution;  any  number  of  merging  algorithms  such  as  n-scari  merging  [49]  or  Salmond’s 
mixture  reduction  [44]  would  supplement  the  pruning  well.  However,  such  algorithms 
are  purely  ad  hoc,  and  there  is  no  guarantee  that  the  result  will  lead  the  iterative 
optimization  to  the  global  minimum.  In  fact,  there  is  quite  probably  a  local  minimum 
close  to  every  initialization  point  we  could  propose  using  such  methods,  as  can  be 
expected  intuitively  when  one  considers  the  case  in  which  the  Gaussian  mixtures  in 
the  original  model  are  well-spaced.  In  such  an  environment,  the  result  generated  by 
the  iterative  optimization  is  only  as  good  as  the  initialization  point  provided  to  the 
algorithm. 


The  cost  function  proposed  in  Section  3.3.2  provides  a  systematic  means  of 
evaluating  the  relative  merit  of  two  possible  solutions  to  the  optimization.  This 


function  is  utilized  extensively  in  the  iterative  algorithm  described  in  Section  3.3.31 
and,  considering  the  importance  of  the  initialization,  it  is  desirable  to  utilize  the 
cost  function  also  in  the  algorithm  that  selects  the  starting  point.  It  was  observed 
experimentally  that  the  the  optimal  solution  generally  has  most  of  its  mixture  com¬ 
ponents  similar  to  the  respective  components  of  the  original  mixture,  and  that  the 
major  changes  produced  by  the  reduction  are  that  similar  components  are  merged, 
and  smaller  probability  weight  or  larger  variance  components  are  deleted]9]  These  ob¬ 
servations  support  the  merging  and  pruning  methods  commonly  employed  in  MHT 
implementations,  but  further  guidance  is  needed  towards  selecting  which  components 
should  be  merged,  which  should  be  deleted  and  which  should  remain  unmodified. 


The  algorithm  developed  provides  a  systematic  methodology  of  selecting  com¬ 
ponents  for  merging  and  deletion  using  the  cost  function  described  in  Section  3.3.21 


9  As  discussed  in  Section  [3.3.1.31  the  ISD  measure  tends  to  favor  keeping  lower- variance  compo¬ 
nents  and  deleting  higher-variance  components. 
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The  basis  of  the  algorithm  at  each  stage  is  to  evaluate  the  cost  of  each  possible  ac¬ 
tion,  and  then  to  select  the  lowest  cost  action.  At  each  step,  the  possible  actions  are 
to  delete  one  of  the  remaining  components  or  to  merge  a  pair  of  remaining  compo¬ 
nents.  This  is  illustrated  in  the  block  diagram  of  Figure  [3761  When  two  components 
are  merged,  the  parameters  of  the  merged  component  are  calculated  according  to 
Eq.  (]2.24ft,  such  that  the  mean  and  covariance  of  the  overall  mixture  remains  un¬ 
changed. 

There  is  no  claim  that  the  algorithm  will  produce  the  optimal  starting  point, 
nor  that  the  starting  point  will  lead  the  iterative  algorithm  to  the  global  minimum. 
Although  the  action  taken  at  each  step  leads  to  the  minimum  cost  increase  for  that 
single  step,  there  is  no  guarantee  that  the  result  over  multiple  steps  is  optimal,  or 
that  there  is  not  a  better  multiple-step  solution  which  does  not  take  the  optimal 
single  step  action  at  each  step.  However,  the  starting  point  obtained  is  the  result  of 
a  sensible  algorithm  which  at  each  step  selects  the  best  action,  taking  into  account 
the  full  PDF  rather  than  considering  individual  pairs  of  components  in  isolation, 
hence  the  result  is  likely  to  produce  significantly  better  results  than  algorithms  which 
consider  only  individual  component  pairs. 

3.3.4. 1  Implementation.  The  algorithm  was  implemented  using 
MATLAB®  version  6.5.  As  described  in  Section  I3.3.21  the  cost  function  consists 
of  three  components:  the  self-likeness  of  the  original  mixture  with  itself,  the 
cross-likeness  of  the  original  mixture  and  the  simplified  mixture,  and  the 
self-likeness  of  the  simplified  mixture  with  itself.  Each  of  these  components  is  the 
sum  over  a  matrix,  the  entries  of  which  represent  the  likeness  of  pairs  of  individual 
Gaussian  components. 
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Finished 


Figure  3.6. 


Block  diagram  of  proposed  Gaussian  mixture  reduction 
initialization  algor  it  hm . 
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The  individual  entries  of  the  matrices  are  of  the  form  of  a  multivariate  Gaussian 


function  evaluation: 

/x2,Pi  +  P2} 

=  PiP2|2tt(Pi  +  P2)|“5  expj-K/i!  -  n2)T{ Pi  +  P2)_1(Mi  -  M2)} 

(3.46) 


The  original  implementation  evaluated  the  cost  of  each  of  the  possible  merging 
and  deletion  possibilities  at  each  processing  cycle,  with  no  regard  for  the  calcula¬ 
tions  which  do  not  change  between  cycles.  Consequently,  the  speed  of  the  original 
implementation  was  very  poor:  even  using  MATLAB®  version  6.5  (using  compiled 
rather  than  interpreted  code)  the  time  required  to  simplify  a  60-component,  four¬ 
dimensional  Gaussian  mixture  down  to  10  components  was  on  the  order  of  349 
seconds-c0]  Considering  that,  even  in  the  simplest  tracking  environment  with  a  single 
target  in  clutter,  this  level  of  processing  would  need  to  be  performed  during  every 
measurement  interval,  this  implementation  is  clearly  unacceptable  for  practical  use. 


The  elements  of  the  cost  function  are  illustrated  in  Figure  13.71  each  square 
represents  an  evaluation  of  the  similarity  measure  between  two  single  multivariate 
Gaussian  components  from  the  respective  mixtures,  as  according  to  Eq.  (13.46).  Con¬ 
sideration  of  the  components  of  the  calculations  that  are  able  to  be  stored  and  not 
repeated  at  every  processing  cycle  leads  to  the  following  observations: 


1.  The  original  mixture  self-likeness  matrix  does  not  change  when  simplification 
steps  are  performed,  hence  the  sum  of  this  matrix  can  be  calculated  once  and 
never  re-evaluated. 

2.  When  a  component  is  deleted,  a  column  of  the  cross-likeness  matrix  will  be 
deleted.  Similarly,  when  two  components  are  merged,  two  columns  will  be  re- 

10 All  benchmarks  discussed  in  this  section  were  performed  on  a  1.4GHz  AMD  Athlon  system. 
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Figure  3.7.  Elements  of  ISD  cost  function.  Each  square  represents 
a  multivariate  Gaussian  evaluation  to  measure  the  simi¬ 
larity  of  the  respective  components  of  the  two  mixtures. 
Shaded  squares  represent  the  components  that  need  to 
be  re-evaluated  if  the  second  component  in  the  reduced 
mixture  is  modified. 

placed  by  a  single  new  column,  representing  the  cross-likeness  of  the  original 
Gaussian  mixture  to  the  newly  merged  component.  To  evaluate  the  cost  of 
every  possible  merge,  the  new  columns  will  need  to  be  calculated  for  every 
possible  merge.  However,  when  a  merge  action  is  taken,  only  the  new  columns 
for  merge  possibilities  involving  the  modified  component  will  need  to  be  re¬ 
calculated,  rather  than  the  new  columns  for  every  possible  merge.  There  are 
\  TV(TV  —  1)  total  merge  possibilities,  where  TV  is  the  number  of  components 
in  the  reduced  mixture  at  the  current  processing  cycle,  starting  from  Nh  at 
the  commencement  of  the  algorithm  and  successively  reducing  to  Nr  as  com¬ 
ponents  are  merged  and  deleted.  Only  (TV  —  1)  of  these  merge  possibilities 
involve  a  given  component. 

3.  When  a  component  is  deleted,  one  column  and  one  row  of  the  reduced  self¬ 
likeness  matrix  will  be  deleted.  When  two  components  are  merged,  two  rows 
and  columns  will  be  replaced  with  a  single  row  and  column  representing  the 
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newly  merged  componentry  To  evaluate  the  cost  for  every  possible  merge,  the 
new  column  will  need  to  be  calculated  for  every  pair  of  components.  When 
a  merge  action  is  taken,  the  full  column  will  need  to  be  recalculated  for  each 
merge  possibility  involving  the  modified  component,  and  the  single  entry  cor¬ 
responding  to  the  modified  component  will  need  to  be  recalculated  for  all  other 
possibilities. 


The  implementation  was  modified  to  store  the  cost  components  for  every  pos¬ 
sible  merge  and  reuse  wherever  possible,  reducing  the  complexity  of  the  algorithm 
to  a  stage  at  which  the  time  to  simplify  a  60-component  four-dimensional  Gaussian 
mixture  down  to  10  components  was  30.3  seconds.  Although  this  is  a  significant 
improvement  over  the  349  second  time  of  the  original  implementation,  the  algorithm 
is  still  to  be  applied  to  problems  significantly  more  complicated  than  this  at  every 
measurement  interval,  hence  it  remains  unacceptable  for  real-time  application. 

Analysis  of  the  optimized  implementation  using  the  MATLAB®  Profiler  re¬ 
vealed  that  78%  the  30.3  second  processing  time  was  spent  evaluating  the  multivari¬ 
ate  Gaussian  function  of  Eq.  (13.46)  above.  The  MATLAB®  implementation  used 
for  the  equation  was  (the  variable  mud  stores  the  difference  between  the  two  mean 
vectors,  /i^,  as  seen  in  Eq.  (13.471)): 

mud  =  mu ( : , i )  -  mu ( : , j ) ; 

Pc  =  PC:,:,!)  +  P(:,:,j); 

dist  =  ps(i)*ps(j)*exp(-0.5*mud,*inv(Pc)*mud)/. . . 
real (sqrt (det (2*pi*Pc) ) ) ; 

The  real  ()  function  call  around  the  sqrt  ()  is  necessary  to  allow  the  expression 
to  be  compiled  using  MATLAB®  version  6.5  due  to  the  possibility  of  a  complex  result. 
On  the  surface  the  code  appears  to  allow  little  room  for  optimization;  however,  in 

nNote  that,  since  the  matrix  is  symmetric,  calculating  the  new  column  also  gives  the  entries 
required  for  the  corresponding  row. 
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the  particular  example  discussed,  the  equation  above  was  evaluated  291,726  times, 
hence  further  consideration  is  warranted. 

Considering  the  quadratic  expression  mud’  *inv(Pc) *mud,  the  result  is  a  single 
scalar  value,  yet  the  full  matrix  inverse  is  calculated.  Perhaps  the  most  obvious  sim¬ 
plification  possible  is  to  replace  the  full  matrix  inverse  with  the  MATLAB®  left  ma¬ 
trix  divide  command  mud’  *Pc\mud,  which  implements  Gaussian  elimination  to  reach 
a  triangular  form,  followed  by  back-substitution  with  the  mean  difference  vector  mud 
without  calculating  the  full  matrix  inverse  [33].  Furthermore,  the  determinant  func¬ 
tion  det  ()  is  commonly  implemented  as  the  product  of  the  pivots,  again  found  using 
Gaussian  elimination  [33] .  These  same  calculations  have  necessarily  been  performed 
to  find  the  inverse  (or  to  perform  back-substitution),  hence  implementation  causes 
the  same  calculations  to  be  performed  twice. 

Neither  of  these  implementation  options  exploit  the  positive-definite  symme¬ 
try  of  the  Pc  matrix.  This  could  be  exploited  using  the  Cholesky  square-root  func¬ 
tion  [33)370]  to  factor  the  matrix  into  a  triangular  square  root  such  that: 


Pc  = 


Using  the  MATLAB®  Cholesky  square-root  function]12!  the  expression  could  be  re¬ 


placed  by: 


PcChol  =  chol(Pc)’; 

dist  =  ps (i) *ps ( j ) *exp(-0 . 5*sum( (PcChol\mud) . “2)/. . . 
prod(sqrt (2*pi) *diag(PcChol) ) ; 


where  PcChol  is  the  Cholesky  square  root  of  the  matrix  Pc.  (Note  that  the  deter¬ 
minant  has  been  simplified  to  the  product  of  the  diagonal  terms  of  the  triangular 
Cholesky  square  root  matrix.)  This  implementation  leaves  further  room  for  opti- 

12The  MATLAB®  implementation  of  the  Cholesky  square  root  returns  the  transpose  of  the 
conventional  factorization,  hence  the  transpose  is  taken  of  the  result. 
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mization  in  that  the  left  matrix  divide  command  may  not  immediately  recognize  the 
triangular  form  of  the  Cholesky  square  root  matrix,  and  that  the  Cholesky  factor¬ 
ization  routine  performs  several  expensive  square  root  evaluations  (Ml 403-405]. 

For  the  above  reasons,  a  U-D  factorization  routine  was  implemented  (M392], 
followed  by  a  custom  back-substitution  procedure.  The  U-D  factorization  routine 
factors  the  covariance  sum  Pc  into  an  upper  triangular  matrix  and  a  diagonal  matrix, 
such  that  the  quadratic  can  be  simplified  as: 

/‘PV'Vd  =  ^Td{UDUT)-^d 

=  ixTdU-TD-xU~xixd 

=  (U-^afD-^U-1^)  (3.47) 

where  the  vector  ( U~lnd )  is  evaluated  using  the  custom  back-substitution  algorithm, 
with  the  answer  stored  in  a  vector  named  Uimud.  The  expression  is  then  evaluated 
as: 


dist  =  ps(i)*ps(j)*exp(-0.5*sum(Uimud. "2  ./  diag(D))/... 
prod(2*pi*diag(D) ) ; 


A  simple  test  scenario  was  created,  evaluating  the  expression  for  all  possible 
pairings  of  500  randomly  generated  four-dimensional  Gaussian  mixture  components. 
The  original  implementation  required  31.6  seconds  to  execute,  the  left  matrix  divide 
method  required  29.5  seconds  to  execute,  the  Cholesky  square  root  method  (avoiding 
the  redundant  determinant  calculation)  required  21.5  seconds  to  execute,  and  the 
custom  U-D  factorization  method  required  11.9  seconds  to  execute.  The  results 
confirm  the  efficiency  of  the  U-D  factorization  method,  especially  considering  that 
the  time  was  far  less  than  other  methods  even  though  a  pure  MATLAB®  script 
implementation  was  used,  incorporating  three  levels  of  nested  for  loops-c3 


13 Analysis  using  the  MATLAB®  version  6.5  profiler  confirmed  that  the  routine  and  surrounding 
loops  were  indeed  compiled  completely. 
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Time  Required  for  250,000 
Multivariate  Gaussian  Evaluations 
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Figure  3.8.  Execution  times  for  various  implementations  for  evalu¬ 
ating  the  match  between  all  pairings  of  500  randomly 
generated  four-dimensional  Gaussian  multivariate  PDFs. 

Following  the  above  results,  the  U-D  covariance  factorization  algorithm  was 
translated  into  the  C  language,  using  the  MATLAB®  MEX  interface.  For  the  simple 
test  above,  the  highly  optimized  C  implementation  reduced  the  time  to  0.03  seconds, 
a  1000 x  saving  over  the  original  implementation  (which  required  31.6  seconds).  The 
MEX  implementation  was  extended  to  implement  the  entire  initialization  algorithm, 
reducing  the  time  for  reducing  the  same  60-component  Gaussian  mixture  described 
above  from  30.3  seconds  for  the  previous  optimized  version  to  0.42  seconds,  a  further 
72  x  improvement  over  the  previous  optimization,  and  an  831  x  improvement  over  the 
original  implementation  (which  required  349  seconds).  The  time  reductions  achieved 
using  the  various  optimization  methods  on  these  two  problems  are  illustrated  in 
Figures  13.81  and  3.9[ 

3.4  Summary 

This  chapter  has  developed  a  structured,  cost  function-based  technique  which 
reduces  the  number  of  components  in  a  Gaussian  mixture  while  modifying  the  overall 


Implementation 
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Time  Required  to  Simplify  60-Component 
Gaussian  Mixture 


Figure  3.9.  Execution  times  for  various  implementations  of  cost 
function  Gaussian  mixture  reduction  initialization  algo¬ 
rithm  to  simplify  60-component  four-dimensional  Gaus¬ 
sian  mixture  to  10  components. 

PDF  structure  less  than  any  of  the  previously  developed  ad  hoc  methods.  The  pre¬ 
sentation  in  Section  [3JZ1  examined  some  of  the  problems  commonly  experienced  with 
the  techniques  utilizing  the  most  compact  target  state  representations,  JPDA  and 
CPDA,  providing  new  insight  into  the  problem  of  the  bias  of  the  JPDA  algorithm, 
and  the  reason  why  CPDA  exhibits  track  coalescence  to  a  greater  extent  than  JPDA. 
Section  3.3  then  developed  the  cost  function-based  optimization  method.  Selections 
for  the  cost  function  were  discussed  in  Section  [3.3.1[  our  choice  was  the  ISD  measure, 
which  was  found  to  be  both  physically  meaningful  and  computationally  tractable.  A 
gradient-based  iterative  optimization  algorithm  was  then  developed  using  this  cost 
function,  as  well  as  an  initialization  algorithm  to  End  a  near-optimal  starting  point. 
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IV.  Simulation  Results 


1 . 1  Introduction 

The  following  sections  present  the  results  of  simulations  performed  to  exam¬ 
ine  the  implementation  of  the  Gaussian  mixture  reduction  algorithm  described  in 
Sections  3.3.3  and  13.3.41  Section  14.211  presents  the  results  of  applying  the  initial¬ 
ization  algorithm  to  a  simple  one-dimensional  problem  with  parameters  chosen  to 
demonstrate  several  characteristics  of  the  technique.  Section  4.3  then  illustrates  the 
refinement  which  may  be  gained  through  iterative  optimization.  Sections  il4.4-H4.6l 
present  the  results  of  simulations  which  examine  the  performance  of  the  algorithms 
in  practical  applications:  first  tracking  a  single  target  in  clutter,  then  multiple  targets 
in  clutter,  and  finally  tracking  a  maneuvering  target. 

4-2  Initialization  Algorithm 

The  initialization  algorithm  described  in  Section  3.3.4  provides  a  systematic 
methodology  of  selecting  mixture  components  for  merging  and  deletion  using  the 
Integral  Square  Difference  (ISD)  distance  measure.  The  algorithm  can  be  used  to 
provide  a  starting  point  for  subsequent  refinement  using  the  iterative  optimization 
techniques  described  in  Section  3.3.31  The  following  sections  illustrate  the  applica¬ 
tion  of  the  initialization  algorithm  on  a  one-dimensional,  five-component  Gaussian 
mixture.  The  parameters  of  the  mixture  are  shown  in  Table  4.11  The  main  peak  of 


Component  H 

Weight 

Mean 

Variance 

1 

0.083 

1 

0.1 

2 

0.167 

2 

20 

3 

0.25 

3 

2 

4 

0.333 

4 

2 

5 

0.167 

10 

2 

Table  4.1.  Parameters  of  the  one-dimensional  Gaussian  mixture  used 
to  test  the  initialization  algorithm. 
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the  mixture  is  produced  by  components  3  and  4,  which  have  similar  means  and  prob¬ 
ability  weights  and  the  same  variance.  The  peak  at  x  =  10  has  the  same  variance  as 
the  two  central  components,  and  a  weight  half  that  of  the  larger  central  component. 
The  wide  component  at  x  =  2  with  large  variance  has  the  same  weight  as  the  peak 
on  the  right  hand  at  x  —  10,  but  its  variance  is  10  times  that  of  the  right-hand  peak, 
hence  the  size  of  the  peak  is  much  smaller.  The  tall,  narrow  peak  at  x  =  1  has  half 
the  probability  weight  of  the  previous  two,  and  a  variance  which  is  one  twentieth  of 
the  components  at  x  =  3,  x  =  4  and  x  =  10,  and  one  two-hundredth  of  the  larger 
variance  component  at  x  —  2. 

Figure  13111  shows  the  original  five-component  Gaussian  mixture  (in  the  top-left 
corner),  and  the  approximations  produced  by  the  ISD  initialization  algorithm  using 
four,  three  and  two  components.  The  results  are  shown  de- normalized,  such  that 
the  weights  of  the  remaining  components  are  not  increased  when  a  component  is 
deleted.  The  component  weights  are  normalized  at  the  end  of  each  processing  cycle 
in  the  testing  performed  in  Sections  I4.4-H4.6l 

Table  14.2  shows  the  steps  which  are  made  to  produce  the  approximations  of 
Figure  147I|  and  the  cost  (using  the  ISD  measure)  of  these  steps.  The  first  step  is 
to  merge  the  two  components  (components  3  and  4  in  Table  14.1)  which  combine  to 
produce  the  central  peak.  Visually,  the  approximation  produced  by  this  step  appears 
to  be  excellent,  as  illustrated  in  the  top  left  plot  in  Figure  4.11  This  subjective 
assessment  is  supported  by  the  small  cost,  9.9  x  10-7,  which  is  incurred  by  the 
approximation.  The  parameters  of  the  merged  pair  are  placed  in  the  lower  component 


Step 

Action 

Cost 

1 

Merge  components  3  and  4  (of  5) 

9.9  x  10"7 

2 

Delete  component  2  (of  4) 

1.8  x  1(T3 

3 

Merge  components  1  and  2  (of  3) 

5.7  x  10“3 

Table  4.2.  Reduction  steps  for  Gaussian  mixture  example. 
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Original  Density  4-Component  Approximation 


3-Component  Approximation 


2-Component  Approximation 


Figure  4.1.  Reduction  of  a  five-component  Gaussian  mixture  to  four-, 
three-  and  two-component  approximations  using  the  ISD 
initialization  algorithm. 
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index,  and  the  higher  index  component  is  deleted  and  indexing  adjusted  accordingly, 
such  that  the  newly  merged  component  becomes  number  3  of  4. 

The  second  step  in  Table  [4151is  to  delete  the  second  mixture  component.  As  dis¬ 
cussed  in  Section  13.3.1.31  the  ISD  cost  measure  applies  more  cost  to  smaller  variance 
components  (which  produce  large,  narrow  peaks)  than  to  larger  variance  components 
(with  flatter,  broader  peaks)  with  the  same  probability  weight.  This  reduction  step 
demonstrates  this  predisposition:  even  though  the  narrow  peak  produced  by  com¬ 
ponent  1  carries  half  the  probability  mass  of  the  much  broader  component  2  (which 
has  a  variance  200  times  that  of  component  1),  the  cost  function  prefers  to  discard 
component  2.  The  cost  of  this  step  (1.8  x  1CT3)  is  three  orders  of  magnitude  greater 
than  the  first  approximation  step,  as  reflected  visually  in  the  modification  in  the 
overall  function  produced  by  the  step. 

The  final  step  in  the  reduction  is  to  merge  components  1  and  2,  which  corre¬ 
spond  to  the  narrow  peak  discussed  above,  and  the  large  peak  produced  by  the  first 
merging  step.  The  cost  of  this  step  (5.7  x  10“3)  is  on  the  same  order  of  magnitude  as 
the  previous  approximation,  as  illustrated  by  the  significant  change  produced  in  the 
overall  function.  Interestingly,  the  cost  of  deleting  component  1  (the  narrow  peak) 
would  have  been  1.0  x  10-2,  which  is  even  larger  than  the  cost  of  deleting  compo¬ 
nent  3  (the  smaller,  wider  peak  on  the  right),  8.4  x  1CT3.  This  again  demonstrates 
the  predisposition  of  the  ISD  measure  towards  giving  more  consideration  to  smaller 
variance  components  than  to  larger  variance  components. 

Figures  [4721  and  14131  show  the  results  of  the  same  approximations  using 
Salmond’s  joining  and  clustering  algorithms,  discussed  in  Section  2.5.11.21  Visually, 
the  representations  provided  by  these  approximations  are  poor  compared  to  the 
corresponding  plots  in  Figure  14,11  This  suggests  that,  on  a  visual  level  at  least,  the 
approximations  produced  by  the  ISD  initialization  algorithm  are  superior  to  those 
produced  by  Salmond’s  joining  and  clustering  algorithms. 
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Original  Density  4-Component  Approximation 


3-Component  Approximation  2-Component  Approximation 


Figure  4.2.  Reduction  of  a  five-component  Gaussian  mixture  to 
four-,  three-  and  two-component  approximations  using 
Salmond’s  joining  algorithm. 
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Original  Density  4-Component  Approximation 


Figure  4.3.  Reduction  of  a  five-component  Gaussian  mixture  to 
four-,  three-  and  two-component  approximations  using 
Salmond’s  clustering  algorithm. 
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4-3  Iterative  Optimization 


The  iterative  optimization  techniques  presented  in  Section  13.3.3  provide  a 
mechanism  for  converging  to  a  local  cost  minimum  which  is  close  to  the  starting 
point  produced  by  the  initialization  algorithm.  This  iterative  convergence  acts  as  a 
successive  refinement  of  the  PDF  approximation,  tuning  the  parameters  in  order  to 
provide  a  better  representation  of  the  original  function.  The  operation  of  the  opti¬ 
mization  algorithm  is  illustrated  in  Figure  14.41  using  the  same  example  discussed  in 
Section  4.21  The  top  left  figure  shows  the  starting  point  for  the  optimization,  calcu¬ 
lated  using  the  ISD  initialization  algorithm.  The  result  of  the  iterative  optimization 
is  shown  after  1,  2  and  12  iterations,  providing  successively  closer  approximations 
to  the  original  PDF  (the  approximation  is  shown  using  a  solid  line,  while  the  origi¬ 
nal  PDF  is  shown  using  a  dashed  line).  It  is  clear  from  Figure  I3~ll  that  the  overall 
structure  of  the  PDF  is  remaining  unchanged:  the  changes  made  by  the  iterative 
optimization  represent  more  of  a  fine  tuning  than  a  large  modification. 

The  cost  function  reduction  as  the  optimization  proceeds  is  shown  in  the  upper 
plot  of  Figure  4.51  The  initial  reduction  is  very  significant,  reducing  the  cost  to 
just  48%  of  its  original  value  in  two  iterations,  and  37%  of  the  its  original  value 
in  four  iterations;  the  reduction  after  this  initial  period  is  less  significant.  The 
break  in  the  line  at  the  ninth  iteration  indicates  that  the  gradient  step  caused  an 
increase  in  the  cost  function  value,  hence  the  step  was  discarded,  and  repeated 
using  a  smaller  step  size.  The  adaptive  step  size  control  algorithm  described  in 
Section  12.61  was  implemented;  its  operation  is  shown  in  the  lower  plot  of  Figure 
4.51  The  algorithm  was  set  to  terminate  after  fifty  optimization  steps,  or  when  the 
improvement  produced  by  an  optimization  step  was  less  than  0.001  of  the  cost  at  the 
starting  point.  Figures  14-51  and  477  suggest  that  a  more  aggressive  stopping  criterion 
would  be  to  terminate  the  optimization  when  a  step  that  increases  the  cost  is  taken 
(noting  that  most  of  the  benefit  of  the  optimization  has  been  gained  by  this  stage). 
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Initialization  Approximation  After  1  Step 


After  2  Steps 


After  1 2  Steps 


Figure  4.4.  Iterative  optimization  of  a  3-component  approximation 
(shown  in  solid  line)  to  a  5-component  Gaussian  mixture 
(shown  in  dashed  line).  The  top  left  figure  shows  the 
starting  point  for  the  optimization,  calculated  using  the 
ISD  initialization  algorithm.  Remaining  figures  show  the 
refined  solution  after  1,  2,  and  12  gradient  iterations. 
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Step  Size 


Iterative  Cost  Reduction 


Figure  4.5.  Cost  function  trajectory  and  step  size  adjustment.  The 
top  figure  shows  the  cost  reduction  as  the  PDF  approx¬ 
imation  is  optimized  iteratively,  while  the  bottom  figure 
shows  the  gradient  step  size  adaptation. 
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Initialization  Approximation 


After  4  Steps 


After  9  Steps 


After  29  Steps 


Figure  4.6.  Iterative  optimization  of  a  3-component  approximation 
(shown  in  solid  line)  to  a  5-component  Gaussian  mixture 
(shown  in  dashed  line).  The  top  left  figure  shows  the 
starting  point  for  the  optimization,  calculated  using  the 
ISD  initialization  algorithm.  The  remaining  figures  show 
the  refined  solution  after  4,  9,  and  29  gradient  iterations. 


The  operation  of  the  iterative  optimization  technique  is  more  obvious  when 
the  starting  point  provided  to  the  algorithm  is  further  from  a  minimum.  Figures 
4.6  and  4.71  illustrate  such  a  situation,  in  which  the  starting  point  is  far  from  a  local 
minimum.  The  example  reveals  that  the  same  minimum  is  reached  eventually;  many 
other  starting  points  will  not  produce  this  result. 


4-4  Single  Target  in  Clutter 

The  single  target  scenario  presented  in  [11]  was  reproduced  to  test  the 
performance  of  the  ISD  initialization  and  optimization  algorithms  in  a  realistic 
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Step  Size  Cost 


Iterative  Cost  Reduction 


Figure  4.7.  Cost  function  trajectory  and  step  size  adjustment.  The 
top  figure  shows  the  cost  reduction  as  the  PDF  approx¬ 
imation  is  optimized  iteratively,  while  the  bottom  figure 
shows  the  gradient  step  size  adaptation. 
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tracking  environment.  The  scenario  simulates  a  radar  tracking  a  target  flying 
through  dense  clutter.  The  target  state  evolves  according  to  the  following  constant 
velocity  state  model: 


x[k) 


z(k) 


Px(k) 

1  T  0  0 
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0  10  0 
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T 
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x(k  —  1)  + 

0 

rp2 

— 

0 

T 

x(k)  +  v(k) 


w(k  —  1) 


(4.1) 


where  T  is  the  time  between  measurement  intervals  (k  —  1)  and  k,  and  w(k)  and 
v(k)  are  two  independent  zero-mean  white  noise  processes  such  that: 


E{w(k)w(k)T}  =  Q  =  ql 
E{v{k)v{k)T}  =  R  =  rl 


The  system  is  provided  with  noise-corrupted  measurements  of  the  target  position 
(x  and  y  coordinates)  through  a  linear  measurement  model;  the  system  could  be 
extended  to  polar  measurements  (i.e. ,  range  and  angle)  using  an  extended  Kalman 
filter  as  described  in  Section [2.2.41 

To  match  the  parameters  presented  by  Salmond  (44116],  T,  q  and  r  were  all 
normalized  to  unity,  the  clutter  density  A  was  set  to  0.012,  and  the  probability  of 
detection  (P(j)  was  set  to  unity.  The  gate  size  was  reduced  such  that  Pg  =  0.99, 
reducing  computational  complexity  significantly  over  the  value  used  in  [44} 43],  Pg  = 
0.999.  The  target  was  initially  located  at  the  origin  with  a  velocity  of  10  units/sec 
in  each  coordinate. 
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The  measurement  space  was  populated  with  false  targets  according  to  a  Pois¬ 
son  distribution  with  density  A  =  0.012  measurements  per  unit  area.  The  region 
populated  was  a  square,  centered  on  the  actual  target  location,  with  side  200^. 
This  value  was  chosen  to  be  large  such  that  hypotheses  could  be  deceived  by  clutter 
measurements  for  several  processing  cycles  without  leaving  the  populated  region. 
The  expected  number  of  false  targets  in  each  processing  cycle  for  this  configuration 
is  480. 


The  criterion  for  loss  of  track  suggested  in  [44114-15]  is  that  the  target-origin¬ 
ated  measurement  has  not  been  incorporated  into  the  measurement  gate  of  any 
hypothesis  for  the  last  five  consecutive  time  steps,  or  that  the  combined  estimate  is 
more  than  10a  from  the  true  target  location  for  five  consecutive  time  steps  (the  com¬ 
parison  being  performed  independently  for  each  state  element,  ignoring  off-diagonal 
covariance  elements),  where  a  is  the  standard  deviation  of  the  state  estimate  from 
the  Kalman  filter  without  measurement  origin  ambiguity.  The  problem  with  the  lat¬ 
ter  criterion  is  that  the  combined  estimate  can  and  will  venture  far  from  the  correct 
target  location  without  the  system  losing  track:  as  long  as  at  least  one  hypothe¬ 
sis  remains  within  the  vicinity  of  the  actual  target  location,  the  combined  estimate 
will  probably  (or,  at  least,  potentially)  move  back  to  the  correct  location  when  the 
uncertainty  is  resolved  using  information  from  later  measurement  sets.  To  resolve 
this  difficulty,  the  latter  criterion  was  modified  such  that  loss  of  track  is  declared 
if  all  hypotheses  are  more  than  10a  from  the  correct  target  location  for  more  than 
five  consecutive  time  steps,  hence  taking  into  account  the  deferred  decision  making 
capability  of  the  multiple  hypothesis  techniques. 


It  is  worth  noting  that,  using  the  original  criteria  proposed  in  [44],  even  the  op¬ 
timal  Bayesian  solution  (with  unbounded  computational  and  memory  requirements) 
will  potentially  have  a  very  limited  track  life.  While  the  optimal  solution  is  guar- 


anteed 1 


to  maintain  a  mixture  component  corresponding  to  the  correct  hypothesis, 


1  Although  the  use  of  measurement  gating  would  weaken  this  guarantee. 
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#  Comp. 

1 

2 

3-8 

9 

10 

15 

20 

25 

30 

35 

40 

ISD  Init. 

200 

200 

200 

200 

200 

200 

200 

200 

200 

106 

- 

Pruning 

200 

200 

200 

200 

200 

200 

200 

200 

200 

200 

200 

Joining 

200 

200 

200 

200 

200 

200 

200 

169 

110 

109 

- 

Clustering 

200 

200 

200 

200 

200 

200 

200 

200 

110 

109 

- 

Lainiotis 

- 

- 

50 

- 

- 

- 

- 

- 

- 

- 

- 

Iter.  Opt. 

50 

50 

50 

50 

50 

- 

- 

- 

- 

- 

- 

Table  4.3.  Number  of  Monte  Carlo  simulations  run  for  each  algorithm 
and  number  of  mixture  components. 


there  is  no  guarantee  that  the  combined  estimate  will  be  close  to  the  correct  loca¬ 
tion  at  any  point  in  time.  Hence,  to  evaluate  how  effectively  a  particular  hypothesis 
reduction  technique  approximates  the  full  Bayesian  solution,  the  modified  criterion 
is  preferable. 


The  initial  set  of  results  consisted  of  200  Monte  Carlo  simulations,  each  of  which 
was  allowed  to  run  until  loss  of  track  was  declared.^  The  simulations  were  run  for 
the  ISD  initialization  algorithm,  the  standard  MHT  pruning  algorithm  (discussed 
in  Section  2.5.111),  and  Salmond’s  joining  and  clustering  algorithms  (discussed  in 
Section  12.5.11-2]),  using  various  numbers  of  hypotheses  for  each.  Some  simulations 
were  also  run  using  the  mixture  reduction  algorithm  described  in  Lainiotis  [3T]  (as 
outlined  in  Sectionl2.5.11.1D.  and  using  the  iterative  optimization  technique  described 
in  Section  3.3.31  Some  of  the  simulations  using  large  numbers  of  mixture  components 
required  a  large  amount  of  time  to  process  (mainly  due  to  the  extremely  long  track 


life  achieved  using  t 
runs  had  completed 


re  algorithms),  hence  they  were  terminated  before  the  full  200 
3  The  number  of  simulations  run  for  each  algorithm  and  each 


number  of  mixture  components  is  summarized  in  Table 


The  average  track  life  for  the  various  algorithms  tested  is  compared  in  Fig¬ 
ure  |4.8j  this  was  one  of  the  major  metrics  used  to  compare  algorithm  performance 


2The  number  of  time  steps  was  actually  capped  at  10,000  for  each  simulation,  but  using  the 
various  algorithms,  this  limit  was  never  reached. 

3In  fact  the  simulations  presented  were  computed  using  23  Intel®  Pentium®  IV-based  systems 
over  a  period  of  approximately  two  weeks. 
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Comparison  of  Average  Track  Life 


+  ISD  Initialization 
A ISD  Iterated 
X  Joining 
©  Clustering 
O  Lainiotis 
X  Pruning 


Number  of  Mixture  Components 


Figure  4.8.  Average  track  life  for  various  merging  and  pruning  algo¬ 
rithms. 

in  |44].  The  diagram  clearly  reveals  the  remarkable  performance  of  the  ISD  initial¬ 
ization  technique  using  a  large  number  of  mixture  components:  the  average  track 
life  is  significantly  greater  than  that  of  the  algorithms  which  previously  have  been 
considered  to  provide  the  best  performance  in  this  scenario.  The  exponential  in¬ 
crease  of  track  life  exhibited  by  the  ISD  initialization  algorithm  indicates  that  the 
tracking  performance  is  limited  only  by  the  availability  of  computational  resources. 
This  is  in  sharp  contrast  to  the  algorithms  which  were  previously  considered  to  pro¬ 
vide  the  best  performance,  which  demonstrate  an  average  track  life  that  levels  out  as 
the  number  of  mixture  components  is  increased,  indicating  that  additional  computer 
resources  would  provide  little  performance  benefit.  The  following  sections  examine 
the  performance  of  this  algorithm  in  comparison  with  the  existing  systems  shown 
in  Figure  I4f8i  standard  MHT  pruning,  Salmond’s  joining  and  clustering  algorithms, 
and  a  modified  Lainiotis  algorithm,  as  well  as  the  ISD-based  iterative  optimization 
technique. 
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Track  Life  Comparison 

Integral  Square  Difference  Initialization  vs  Pruning 


1  2  3  4  5  6  7  8  9  10  15  20  25  30  35 

Number  of  Mixture  Components 


Figure  4.9.  Performance  of  fSD  initialization  algorithm  compared  to 
the  standard  MHT  pruning  algorithm. 

4-4-1  Comparison  with  Pruning  Algorithm.  Figure  fOJ]  compares  the  per¬ 
formance  of  the  ISD  initialization  algorithm  to  a  pruning  algorithm  which  keeps  the 
most  likely  hypotheses,  up  to  the  desired  number.  The  bars  in  the  graph  of  Figure 
14.91  indicate  the  proportion  of  simulations  in  which  each  algorithm  outperformed  the 
other:  the  black  region  denotes  the  proportion  of  simulations  in  which  the  track  life 
of  the  ISD  initialization  algorithm  was  longer  than  that  of  the  pruning  algorithm; 
the  white  region  denotes  the  proportion  of  simulations  in  which  the  track  life  of 
the  pruning  algorithm  was  longer  than  that  of  the  ISD  initialization  algorithm;  and 
the  gray  region  denotes  the  proportion  of  simulations  in  which  the  track  life  of  the 
two  algorithms  was  essentially  the  same  (i.e. ,  within  10  scans  of  each  other).  It  is 
the  belief  of  the  author  that  this  method  of  presentation  provides  the  most  reliable 
comparison  of  the  performance  of  two  algorithms.  If  the  two  algorithms  lose  track  at 
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approximately  the  same  time,  then  the  same  sequence  of  measurements  caused  loss 
of  track  on  each.  If  the  sequence  of  measurements  causes  loss  of  track  on  one  algo¬ 
rithm  but  not  the  other,  then  the  surviving  algorithm  is  demonstrated  to  be  superior 
in  that  circumstance.  The  amount  of  time  that  the  surviving  algorithm  maintains 
track  after  the  point  where  loss  of  track  occurred  for  the  other  algorithm  is  largely 
irrelevant,  as  the  performance  of  the  algorithm  losing  track  has  not  been  tested  past 
the  point  where  loss  of  track  occurred.4  Accordingly,  a  performance  metric  consid¬ 
ering  each  run  equally  according  to  which  algorithm  maintained  track  longer ,  and 
not  how  much  longer  the  surviving  algorithm  maintained  track,  provides  the  fairest 
comparison. 


The  scatter  plot  in  Figure  14.101  compares  the  track  life  of  the  two  algorithms 
for  each  of  the  200  Monte  Carlo  simulations,  using  25  mixture  components  for  each 
algorithm.  Each  cross  on  the  diagram  represents  a  single  Monte  Carlo  simulation: 
the  x-coordinate  is  the  number  of  scans  for  loss  of  track  to  occur  using  the  ISD 
initialization  algorithm,  while  the  ^-coordinate  is  the  number  of  scans  for  loss  of  track 
to  occur  using  the  pruning  algorithm  on  exactly  the  same  simulation.  The  dashed 
45°  line  is  the  contour  for  which  the  performance  of  the  two  algorithms  is  identical. 
A  cross  above  this  dashed  line  represents  a  Monte  Carlo  simulation  for  which  the  life 
of  the  pruning  algorithm  was  longer  than  that  of  the  ISD  initialization  algorithm; 
conversely,  a  cross  below  the  dashed  line  represents  a  Monte  Carlo  simulation  for 
which  the  life  of  the  ISD  initialization  algorithm  was  longer  than  that  of  the  pruning 
algorithm.  The  concentration  of  crosses  significantly  below  the  45°  line  indicates 
that  the  ISD  initialization  algorithm  performs  significantly  better  than  the  pruning 
algorithm. 


In  many  Monte  Carlo  simulations  it  was  obvious  that  the  hypothesis  utiliza¬ 
tion  of  the  simplistic  pruning  method  was  poor.  The  performance  of  the  pruning 


4i.e.,  there  is  nothing  to  suggest  that  the  algorithm  losing  track  would  not  have  been  able 
maintain  track  as  well  as  or  better  than  the  surviving  algorithm  if  the  algorithm  losing  track  had 
been  able  to  maintain  track  through  the  sequence  of  measurements  causing  the  loss. 
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25-Comp  Pruning  Track  Life  (number  of  scans) 


25-Comp  ISD  Initialization  Track  Life  (number  of  scans) 

Figure  4.10.  Performance  of  25-component  ISD  initialization  algo¬ 
rithm  compared  to  25-component  pruning  algorithm. 
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Comparison  of  Track  Life 


Figure  4.11.  Performance  of  25-component  ISD  initialization  algo¬ 
rithm  compared  to  100-component  pruning  algorithm. 


algorithm  utilizing  100  mixture  components  is  compared  to  the  25-component  ISD 
initialization  algorithm  in  Figure  Bill  In  The  diagram  demonstrates  that,  even  using 
four  times  the  number  of  mixture  components,  the  performance  of  the  pruning  al¬ 
gorithm  is  consistently  inferior  to  that  of  the  ISD  initialization  algorithm.  Even  in 
these  simulations,  utilizing  100  mixture  components,  it  was  observed  that  the  hy¬ 
potheses  remained  clustered  in  a  bunch  for  most  of  the  simulation,  rather  than  being 
utilized  effectively  to  follow  significantly  different  branches.  The  Gaussian  mixture 


5The  simulation  utilized  the  extended  clutter  population  discussed  in  Section  14,4.21  37  Monte 
Carlo  runs  were  calculated. 
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PDF  has  the  capability  to  model  complex  multi-modal  distributions,  yet  if  a  simple 
pruning  mechanism  is  used  to  keep  only  the  most  likely  hypotheses,  the  hypotheses 
retained  by  the  algorithms  will  be  anything  but  multi-modal.  Only  with  an  efficient 
merging  algorithm  is  the  true  power  of  the  MHT  realized. 


4- 4-2  Comparison  with  Salmond’s  Joining  and  Clustering  Algorithms.  On 
the  surface,  the  computational  complexity  of  the  initialization  method  discussed  in 
Section  3.3.4  appears  to  be  an  order  of  magnitude  higher  than  the  complexity  of  the 
joining  and  clustering  algorithms  presented  in  [44H47].  However,  in  many  simula¬ 
tions  in  this  study,  instability  was  encountered  in  the  merged  covariance  matrices, 
increasing  the  computational  complexity  of  the  joining  and  clustering  algorithms  to 
an  order  of  magnitude  above  the  ISD  initialization  method. 


The  joining  and  clustering  filter  covariance  appeared  stable  in  simulations  in 
which  the  probability  of  detection  was  set  close  to  unity.  However,  when  the  prob¬ 
ability  of  detection  was  reduced  to  0.95  (which  is  still  higher  than  experienced  in 
many  practical  applications),  the  covariance  of  hypotheses  farther  from  the  true 
target  tended  to  increase  to  such  an  extent  that  the  measurements  for  the  entire 
surveillance  region  were  associated  with  the  hypothesis.  In  such  a  situation,  the  com¬ 
putational  complexity  increases  dramatically  with  the  hypothesis  covariance  (more 
measurements  are  within  the  association  gate,  resulting  in  more  hypotheses),  an  in¬ 
crease  which  is  bounded  only  by  the  size  of  the  surveillance  region.  This  problem 
was  addressed  by  limiting  the  maximum  number  of  measurements  in  the  association 
gate  for  any  single  hypothesis  to  50 J6]  which  should  not  be  exceeded  in  any  practical 
situation  with  a  stable  filter  covariance.  This  approximation  was  not  applied  in  the 
initial  set  of  simulations;  the  performance  of  the  algorithm  using  this  limit  is  shown 


in  Figure  14.171 


6i.e.,  the  50  measurements  closest  to  the  predicted  value  for  that  hypothesis. 
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As  discussed  in  Section  13.3.1.31  the  ISD  cost  function  applies  lower  cost  weight¬ 
ing  to  components  with  larger  covariance  than  to  those  with  smaller  covariance. 
Because  of  this  de-prioritization,  large  covariance  components  tend  to  be  discarded, 
thus  avoiding  this  explosive  increase  in  computational  complexity.  In  this  regard,  the 
ISD  initialization  algorithm  appeared  stable  throughout  the  simulations,  far  more  so 
than  algorithms  which  concentrate  purely  on  merging. 

Another  problem  with  the  joining  and  clustering  algorithms  is  the  delicate 
relationship  which  exists  between  the  threshold  used  to  discard  components,  and 
the  probability  that  the  target  is  detected  and  is  within  the  association  gate.  This 
was  observed  in  simulations  in  which  the  probability  of  detection  was  set  to  0.999, 
and  the  probability  that  target  is  within  the  gate  was  set  to  0.98  and  omitted  from 
the  hypothesis  probability  calculation  (effectively  setting  Pdg  =  0.999  in  Eq.  (2.60 j) 
rather  than  Pd).  In  the  resulting  set  of  hypotheses,  events  in  which  the  target  is 
not  detected  are  de-weighted  by  999  times  in  comparison  with  events  in  which  the 
target  is  detected.  Following  the  advice  of  Salmond  [44],  the  threshold  for  discard¬ 
ing  components  was  set  such  that  the  least  likely  1%  of  hypotheses  are  discarded 
and  the  remainder  maintained  and  merged  until  the  desired  level  of  reduction  has 
been  achieved.  However,  with  the  probability  of  detection  set  to  0.999,  this  guaran¬ 
tees  that  almost  all  events  hypothesizing  missed  detection  will  be  discarded  without 
further  consideration.  In  such  situations,  if  the  target  is  not  detected  (or  if  the 
target-originated  measurement  falls  outside  of  the  association  gate),  the  correct  hy¬ 
pothesis  (missed  detection)  will  be  discarded,  and  the  system  will  quite  possibly  lose 
track.  Even  with  this  incorrect  hypothesis  probability  calculation,  the  ISD  initial¬ 
ization  method  performed  well,  demonstrating  the  robustness  which  is  incorporated 
into  the  algorithm  through  the  trade-off  between  the  cost  of  merging  components 
and  the  cost  of  deleting  components. 

The  performance  of  Salmond’s  joining  algorithm  is  compared  to  the  ISD  ini¬ 
tialization  algorithm  in  Figure  14.121  The  diagram  demonstrates  that  each  algorithm 
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Figure  4.12.  Performance  of  ISD  initialization  algorithm  compared 
to  Salmond  joining  algorithm. 
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Comparison  of  Track  Life 


Figure  4.13.  Performance  of  5-component  1SD  initialization  algo¬ 
rithm  compared  to  5-component  Salmond  joining  algo¬ 
rithm. 

exhibits  better  performance  at  different  ends  of  the  spectrum.  For  a  small  number  of 
mixture  components,  the  performance  of  the  two  algorithms  is  roughly  equivalent. 
The  Salmond  joining  algorithm  exhibits  noticeably  better  performance  than  the  ISD 
initialization  algorithm  when  the  number  of  mixture  components  is  between  four  and 
seven.  The  scatter  plot  comparing  the  ISD  initialization  and  joining  algorithms  using 
five  components  is  shown  in  Figure  14.131  The  diagram  demonstrates  that  there  is  a 
major  concentration  of  simulations  in  which  the  joining  algorithm  performs  slightly 
better  than  the  ISD  initialization  algorithm,  and  that  apart  from  this  cluster  (in  the 
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Comparison  of  Track  Life 


Figure  4.14.  Performance  of  30-conrponent  ISD  initialization  algo¬ 
rithm  compared  to  30-component  Salmon d  joining  al¬ 
gorithm. 

bottom  left  corner,  slightly  above  the  45°  line),  there  is  little  difference  between  the 
two  algorithms. 

At  the  upper  end  of  the  scale  (for  15  mixture  components  and  greater),  it  is 
apparent  that  the  ISD  initialization  algorithm  significantly  outperforms  the  joining 
algorithm,  which  previously  provided  the  best  performance  in  this  scenario.  The 
scatter  plots  for  the  30-  and  35-component  cases  are  shown  in  Figures  [4.141  and  14.15 
respectively.  The  diagrams  illustrate  that  the  ISD  initialization  algorithm  outper¬ 
forms  the  joining  algorithm  significantly  for  a  large  proportion  of  the  simulations. 
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35-Comp  Joining  Track  Life  (number  of  scans) 


Comparison  of  Track  Life 


Figure  4.15.  Performance  of  35-component  ISD  initialization  algo¬ 
rithm  compared  to  35-component  Salmond  joining  al¬ 
gorithm. 
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The  excellent  performance  of  the  1SD  initialization  algorithm  is  indicative  of  the 
efficiency  gained  by  taking  all  mixture  components  into  consideration  when  select¬ 
ing  merging  and  deletion  actions,  rather  than  considering  only  individual  pairs,  as 
done  in  the  Salmond  algorithm.  Another  explanation  of  the  failing  performance  of 
the  joining  algorithm  is  the  deletion  threshold.  As  the  total  number  of  hypotheses 
increases  in  number,  the  probability  of  each  individual  hypothesis  decreases,  and 
there  is  a  large  number  of  incorrect  hypotheses  with  which  the  correct  hypothesis 
must  compete  to  gain  probability.  In  such  a  situation  it  is  possible  that  the  correct 
hypothesis  could  fall  within  the  least  likely  1%  of  hypotheses,  and  mistakenly  be 
deleted.  This  is  another  example  of  the  lack  of  robustness  which  is  unavoidable  in 
algorithms  that  are  unable  to  evaluate  the  relative  cost  between  merging  components 
and  deleting  components. 


The  performance  of  Salmond’s  clustering  algorithm  is  compared  to  the  ISD 
initialization  algorithm  in  Figure  IT.  161  The  diagram  exhibits  the  same  general  trends 
as  Figure  [4,121  although  the  overal 


to  that  of  the  joining  algorithm 


performance  of  the  clustering  algorithm  is  inferior 
The  ISD  initialization  algorithm’s  performance 
increases  for  large  numbers  of  components  far  more  in  comparison  with  the  clustering 
algorithm  than  in  the  comparison  with  the  joining  algorithm,  which  indicates  that 
the  clustering  algorithm  is  less  efficient  when  a  large  number  of  components  is  being 
used. 


The  joining  and  clustering  implementations  previously  discussed  did  not  limit 
the  maximum  number  of  measurements  associated  with  a  single  hypothesis,  hence 
difficulties  with  unstable  covariance  growth  were  experienced,  as  previously  de¬ 
scribed.  The  comparison  is  repeated  in  Figure  14.171  for  the  30-  and  35-component 
simulations  (utilizing  200  Monte  Carlo  simulations  for  the  30-component  case  and 
106  for  the  35-component  case)  with  the  maximum  number  of  hypotheses  spawned 


7 As  described  in  [44447],  the  clustering  algorithm  was  designed  as  a  further  approximation  to 
the  joining  algorithm  in  an  attempt  to  reduce  the  computational  complexity. 
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Figure  4.16.  Performance  of  ISD  initialization  algorithm  compared 
to  Salmond  clustering  algorithm. 
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Track  Life  Comparison 

Integral  Square  Difference  Initialization  vs  Salmond  Join  and  Clustering  Algorithms 
Limited  Hypothesis  Spawning 
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Figure  4.17.  Performance  of  ISD  initialization  algorithm  compared 
to  Salmond  clustering  and  joining  algorithms  with  the 
maximum  number  of  hypotheses  spawned  by  any  parent 
hypothesis  limited  to  50. 

by  any  parent  hypothesis  limited  to  50.  Comparing  the  diagram  with  Figures  14.12 
and  14.161  indicates  that  the  hypothesis  limiting  makes  very  little  difference  to  the  per¬ 
formance  of  Salmond’s  algorithms.  Any  practical  implementations  of  the  algorithms 
would  need  to  limit  the  number  of  hypotheses  to  ensure  computational  tractability. 

Following  from  the  earlier  discussion  of  the  instability  of  component  covari¬ 
ances  using  Salmond’s  joining  algorithm,  the  25-,  30-  and  35-component  simulations 
were  repeated,  with  the  region  populated  by  clutter  increased  in  size  by  10  x,  from  a 
square  of  side  200^  to  a  square  of  side  2,  000 v/r.  This  change  in  the  clutter  popu¬ 
lation  region  increases  the  expected  number  of  clutter-originated  measurements  for 
each  scan  interval  from  480  to  48,000.  To  process  this  number  of  false  measure- 
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Comparison  of  Average  Track  Life 

Using  Extended  Clutter  Population  Region 
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Figure  4.18.  Average  track  life  for  scenario  using  extended  clutter 
population  region. 


ments,  a  more  efficient  gating  routine  was  necessary;  the  algorithm  described  in 
Appendix  A. 21  which  is  algebraically  equivalent  to  the  original  gating  methodology, 
was  employed.  Using  these  updated  parameters,  73  Monte  Carlo  simulations  were 
calculated.  As  the  clutter  density  was  not  modified,  one  would  expect  that  the  per¬ 
formance  would  be  unchanged  (on  average),  unless  the  algorithms  were  being  aided 
by  the  limited  clutter  population  region.  The  average  track  life  of  each  algorithm 
for  the  modified  scenario  is  shown  in  Figure  14.181  (all  algorithms  had  the  number 
of  hypotheses  spawned  by  any  parent  hypothesis  limited  to  50).  The  diagram  con¬ 
firms  the  trend  that  the  performance  of  the  ISD  initialization  algorithm  increases 
exponentially  as  the  number  of  components  is  increased,  whereas  the  performance 
of  the  joining  and  clustering  algorithms  levels  out.  This  again  indicates  that  the 
performance  of  the  ISD  initialization  algorithm  is  limited  only  by  the  availability  of 
computing  resources,  whereas  little  performance  benefit  is  gained  using  the  joining 
and  clustering  algorithms  by  increasing  the  size  of  the  mixture  beyond  25  cornpo- 
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nents.  Comparing  Figure  14.181  to  Figure  14.81  reveals  that  the  average  track  life  of 
all  algorithms  has  been  reduced  somewhat,  indicating  that  all  of  the  algorithms 
may  have  been  assisted  by  the  limited  clutter  population  region  somewhat.  The 
histogram  of  the  data  used  for  these  points  is  shown  in  Figure  14.191  From  this  dia¬ 
gram,  it  is  difficult  to  judge  whether  the  increased  clutter  population  region  caused  a 
deterioration  in  the  performance  of  the  various  algorithms.  There  is  a  visible  differ¬ 
ence  between  the  original  simulation  histogram  and  the  extended  clutter  population 
region  histogram  for  the  35-component  joining  algorithm;  in  most  other  cases  it  is 
difficult  to  declare  such  a  difference.  This  reduction  in  performance  for  the  case 
in  which  the  clutter  population  region  is  extended  in  size  indicates  that  the  algo¬ 
rithm  is  benefitting  from  the  limited  clutter  population  region.  This  phenomenon  is 
probably  caused  by  incorrect  hypotheses  drifting  beyond  the  area  populated  by  false 
measurements  and  being  deleted  when  they  otherwise  may  have  survived  longer. 

Even  if  the  various  algorithms  were  aided  by  the  limited  clutter  population  re¬ 
gion,  this  is  not  entirely  unrealistic.  Practical  radar  systems  have  limited  detection 
ranges,  and  individual  detections  are  limited  by  the  maximum  unambiguous  range 
(and  Doppler)  imposed  by  the  waveform  in  use  [50].  Hence  the  realistic  radar  mea¬ 
surement  space  will  not  extend  infinitely,  but  rather  it  will  exist  within  a  bounded 
region. 

The  performance  of  the  1SD  initialization  algorithm  is  compared  to  the  join¬ 
ing  and  clustering  algorithms  for  the  updated  scenario  in  Figure  14.201  The  diagram 
again  confirms  the  superior  performance  of  the  1SD  initialization  algorithm  for  large 
numbers  of  mixture  components;  the  performance  difference  is  significantly  greater 
than  that  shown  in  Figure  |4.12[  indicating  that  Salmond’s  algorithm  was  being  as¬ 
sisted  by  the  finite  clutter  population  region  to  a  much  greater  extent  than  the  1SD 
initialization  algorithm.  The  1SD  initialization  and  Salmond  joining  algorithms  us¬ 
ing  25,  30  and  35  mixture  components  are  compared  in  the  scatter  plots  of  Figures 
14.211 14.221  and  14.23  respectively.  The  diagrams  demonstrate  that  the  1SD  initializa- 


4-30 


25-Comp  ISD  Init 


25-Comp  Joining 


25-Comp  Clustering 


Figure  4.19.  Histogram  of  track  life  for  ISD  initialization  and 
Salmond’s  joining  and  clustering  algorithms,  utilizing 
25,  30  and  35  mixture  components.  Plots  labelled 
“ECPR”  describe  the  Monte  Carlo  simulations  utilizing 
the  extended  clutter  population  region;  the  remaining 
plots  describe  the  original  scenario. 
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Figure  4.20.  Performance  of  ISD  initialization  algorithm  compared  to 
Salmond  joining  and  clustering  algorithms,  with  clutter 
population  region  increased  in  size  by  ten  times  in  both 
x  and  y  axis  directions. 
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25-Comp  Joining  Track  Life  (number  of  scans) 


Comparison  of  Track  Life 


Figure  4.21.  Performance  of  25-component  ISD  initialization  algo¬ 
rithm  compared  to  25-component  Salmond  joining  algo¬ 
rithm,  with  clutter  population  region  increased  in  size 
by  ten  times  in  both  x  and  y  axis  directions. 
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30-Comp  Joining  Track  Life  (number  of  scans) 
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Figure  4.22.  Performance  of  30-component  ISD  initialization  algo¬ 
rithm  compared  to  30-component  Salmond  joining  algo¬ 
rithm,  with  clutter  population  region  increased  in  size 
by  ten  times  in  both  x  and  y  axis  directions. 
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35-Comp  Joining  Track  Life  (number  of  scans) 


Comparison  of  Track  Life 


Figure  4.23.  Performance  of  35-component  ISD  initialization  algo¬ 
rithm  compared  to  35-component  Salmond  joining  algo¬ 
rithm,  with  clutter  population  region  increased  in  size 
by  ten  times  in  both  x  and  y  axis  directions. 
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tion  algorithm  increasingly  outperforms  the  Salmond  joining  filter  as  the  number  of 
mixture  components  grows.  The  scatter  plots  have  very  few  points  above  the  45° 
line,  and  a  large  number  of  points  significantly  below  the  45°  line,  indicating  that 
the  ISD  algorithm  is  outperforming  Salmond’s  joining  algorithm,  which  has  previ¬ 
ously  provided  the  best  performance  in  this  scenario,  by  a  considerable  margin.  The 
difference  between  the  algorithms  appears  to  be  greater  than  that  shown  in  Figures 
4.141  and  4.151  which  again  indicates  that  the  joining  algorithm  was  being  aided  by 
the  limited  clutter  population  region.  Once  again  it  should  be  noted  that  Salmond’s 
joining  algorithm  previously  was  considered  to  provide  the  best  performance  for  this 
type  of  scenario. 


4-4-3  Comparison  with  Lainiotis  Algorithm.  The  algorithm  of  Lainiotis  [31] 
was  considered  as  another  reference  to  provide  further  comparison.  The  implemen¬ 
tation  described  in  [311627]  uses  two  separate  thresholds  for  merging  and  deleting, 
using  Eqs.  (|2.76[)  and  (12.771)  to  evaluate  the  cost  of  each  action.  In  order  to  provide 
a  better  comparison  to  the  initialization  algorithm  presented  in  Section  13.3.41  the 
algorithm  was  implemented  to  compare  the  cost  of  all  possible  actions  (both  merging 
and  deleting),  and  the  action  producing  the  smallest  cost  was  taken.  When  this  irn- 


;o  delete  mixture  components 
This  trend  was  observed  also 


plementation  was  executed,  the  algorithm  was  founc_ 
at  almost  every  step,  rarely  choosing  to  merge  at  all.]8 
when  the  algorithm  was  applied  to  the  simple  one-dimensional  reduction  discussed  in 
Section  14.21  leading  to  extremely  poor  reduced  PDF  approximations.  This  indicates 
that  the  costs  calculated  for  deleting  and  merging  components  are  not  suitable  for 
comparison:  they  provide  a  reasonable  mechanism  for  evaluating  the  relative  cost  of 
deleting  different  components,  and  likewise  a  reasonable  mechanism  for  evaluating 
the  relative  cost  of  merging  pairs  of  components,  but  they  do  not  provide  a  trade-off 


8The  increased  performance  of  the  various  merging  algorithms  presented  previously  as  compared 
to  the  standard  MHT  pruning  algorithm  demonstrates  that  merging  components  is  almost  always 
a  better  choice  than  deleting  components. 
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between  deleting  and  merging.  This  also  reveals  why  the  author  chose  to  select  a 
different  threshold  for  deleting  and  merging. 

After  observing  from  the  ISD  initialization  algorithm  that  the  lowest  cost  action 
is  usually  merging  components,  and  that  the  system  rarely  chooses  to  delete  com¬ 
ponents,  the  Lainiotis  algorithm  was  implemented  to  merge  sequentially  the  pair  of 
components  which  leads  to  the  smallest  reduction  in  the  Bhattacharyya  coefficient 
between  the  original  PDF  and  the  approximation  (denoted  pa),  as  according  to  the 
approximation  of  Eq.  (2.77j): 


pa>l~(Pi+  Pj)^1  ~  Pi,  j2 

where  pi  and  Pj  are  the  weights  of  the  two  components  to  be  merged,  and  pt.j  is  the 
Bhattacharyya  coefficient  between  the  two  individual  components  being  considered 
for  merging.  The  merging  continues  until  the  number  of  mixture  components  has 
been  reduced  to  the  desired  level.  The  algorithm  was  run  for  the  same  test  case 
described  above,  for  50  Monte  Carlo  runs,  using  3,  4,  5,  6,  7  and  8  mixture  compo¬ 
nents.  The  results  of  the  algorithm  are  compared  to  the  ISD  initialization  algorithm 
in  Figure  14.241  The  diagram  indicates  that  the  performance  of  the  technique  is  not 
as  good  as  that  provided  by  either  joining  or  clustering,  and  thus  the  technique  is 
further  inferior  to  ISD  initialization  than  either  of  Salmond’s  algorithms.  Observing 
the  form  of  the  cost  function  of  Eq.  (2.77b  the  difference  is  probably  caused  by  the 
difference  in  the  way  in  which  the  probability  weights  are  incorporated  into  the  equa¬ 
tion.  The  Salmond  expressions  (Eqs.  (2.781)  and  (2.791) )  incorporate  the  probability 
weights  through  the  factor  PiPj/{jpi  +  pj ).  This  expression  is  a  smooth  interpolation 
of  the  minimum  of  the  two  probability  weights,  similar  to  the  combined  resistance 
of  two  resistors  in  parallel.  Accordingly,  Salmond’s  expressions  will  tend  to  allow 
very  small  components  to  be  merged  into  larger  components,  effectively  providing 
a  mechanism  for  deleting  small  components  without  changing  the  overall  structure 
of  the  PDF.  Lainiotis’  expression  incorporates  the  probability  weights  through  the 
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Figure  4.24.  Performance  of  ISD  initialization  algorithm  compared 
to  modified  Lainiotis  algorithm. 
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factor  (jpi  +  Pj),  which  will  be  large  if  either  weight  is  large,  not  providing  such  a 
deletion  mechanism. 

4-4-4  Comparison  with  Iterative  Optimization  Algorithm.  The  results  pre¬ 
sented  previously  were  obtained  using  the  ISD  initialization  algorithm,  without  uti¬ 
lizing  the  iterative  optimization  method  described  in  Section  13.3.31  The  initialization 
algorithm  was  utilized  without  the  iterative  optimization  algorithm  mainly  due  to 
the  computational  complexity  of  the  iterative  technique. 

The  iterative  optimization  method  was  applied  over  50  Monte  Carlo  simula¬ 
tions  in  order  to  evaluate  the  performance  enhancement  produced.  Considering  the 
discussion  of  Section  5131  it  would  not  be  surprising  if  the  iterative  optimization  tech¬ 
nique  did  not  produce  a  substantial  increase  in  performance,  as  the  overall  structure 
of  the  PDF  approximation  remains  largely  unmodified.  However,  the  results  of  these 
simulations  appear  to  indicate  that  the  performance  using  the  iterative  optimization 
method  is  actually  slightly  worse  than  the  initialization  algorithm  alone,  which  is 
rather  surprising.  The  performance  of  the  two  algorithms  is  compared  in  Figure 
14,251  using  1  to  10  mixture  components.  The  scatter  plot  for  the  10-component  case 
is  shown  in  Figure  4.261  demonstrating  that  the  iterative  optimization  substantially 
reduces  the  track  life  in  a  number  of  simulations. 

This  result  is  even  more  surprising  when  one  considers  that  the  iterative  op¬ 
timization  technique  is  guaranteed  not  to  increase  the  cost  of  the  reduced  PDF 
representation,  and  that  in  almost  any  practical  situation  the  cost  will  indeed  be 
reduced.  Thus  the  outcome  of  this  is  that  a  PDF  representation  producing  a  lower 
ISD  cost  does  not  necessarily  result  in  better  tracking  performance.  An  interesting 
interpretation  of  this  result  is  found  when  one  considers  the  equations  used  to  cal¬ 
culate  the  parameters  of  the  merged  component  when  two  mixture  components  are 
combined  in  the  ISD  initialization  algorithm,  as  shown  in  Eq.  (12.241).  The  parameters 
in  this  equation  are  such  that  the  mean  and  covariance  of  the  overall  mixture  remain 
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Proportion  of  Simulations 


Track  Life  Comparison 

Integral  Square  Difference  Initialization  vs 
Integral  Square  Difference  Optimized 
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□  ISD  Iteration  Better 

□  Same 

■  ISD  Initialization  Better 


Figure  4.25.  Performance  of  fSD  initialization  algorithm  compared  to 
same  algorithm  utilizing  iterative  optimization  to  refine 
the  approximation. 
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10-Comp  ISD  Iterated  Track  Life  (number  of  scans) 


Comparison  of  Track  Life 


Figure  4.26.  Comparison  of  track  life  for  simulations  of  10-component 
ISD  initialization  algorithm,  and  the  same  algorithm 
utilizing  iterative  optimization  to  refine  the  approxima¬ 
tion. 
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unchanged  through  the  merging  operation.  These  parameters  do  not  necessarily 
represent  the  optimal  fit  of  a  single  component  to  the  original  two  components  ac¬ 
cording  to  the  ISD  cost  function.™  Rather,  these  parameters  can  be  shown  to  be  the 
optimum  parameters  according  to  the  Maximum  Likelihood  measure,  as  presented 
in  Section  13.3.1.41  Accordingly,  the  ISD  initialization  algorithm  is  not  based  on  the 
ISD  cost  function  alone:  it  uses  the  ISD  cost  function  to  select  which  components 
to  merge  or  delete,  and  then  uses  the  Maximum  Likelihood  measure  to  calculate  the 
parameters  of  the  merged  components.  The  result  of  this  section  therefore  indicates 
that  the  performance  of  this  “hybrid”  approach,  incorporating  the  Maximum  Like¬ 
lihood  measure  to  select  the  parameters  of  the  merged  components,  is  better  than 
that  of  a  “pure”  ISD  implementation.  This  is  not  surprising  when  one  considers 
that  the  Maximum  Likelihood  measure  was  the  preferred  cost  function  in  terms  of 
physical  meaningfulness  —  except  that  it  was  unable  to  be  evaluated.^0 


4-4-5  Comparison  with  PDA  Algorithm.  In  order  to  provide  a  comparison 
with  the  performance  of  the  various  algorithms  using  a  single  Gaussian  mixture, 
the  simulations  were  run  for  the  Probabilistic  Data  Association  (PDA)  algorithm, 
described  in  Section  12.5.71  In  this  scenario  the  clutter  density  was  high  enough  that 
the  covariance  of  the  PDA  algorithm  exhibited  unbounded  growth  until  loss  of  track 
occurred.  The  results  of  the  simulations  are  compared  in  Figure  14.271  It  is  not 
surprising  that  the  performance  of  the  Salmond  merging  and  clustering  algorithms 
is  almost  identical  to  that  of  the  PDA  algorithm.  Other  than  the  deletion  logic 


9  As  far  as  the  author  is  aware,  there  is  no  closed-form  solution  for  the  optimal  parameters, 
according  to  the  ISD  cost  function,  of  a  single  component  representing  a  pair  of  components. 

10The  parameters  for  the  merged  component  can  be  found  in  closed  form  using  the  Maximum 
Likelihood  measure  because  the  logarithm  operation  in  Eq.  (j3.16|)  is  of  the  reduced  mixture.  Thus, 
if  the  reduced  mixture  contains  only  a  single  component  (as  is  the  case  when  we  are  solving  for 
the  parameters  of  the  single  component  which  provide  the  best  fit  to  a  pair  of  components),  then 
the  logarithm  will  be  of  a  single  Gaussian  function,  which  results  in  an  expression  which  can  be 
evaluated  in  closed  form,  and  a  parameter  optimization  that  can  be  solved  also  in  closed  form.  If  the 
reduced  mixture  contains  multiple  components,  then  the  logarithm  operation  cannot  be  simplified, 
and  the  cost  function  must  be  evaluated  through  numerical  integration  or  approximation.  Hence, 
purely  due  to  computational  tractability,  the  ISD  cost  function  remains  preferable. 
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Test  Algorithm 

Figure  4.27.  Performance  of  PDA  compared  to  other  algorithms  us¬ 
ing  a  single  Gaussian  component. 
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(which  discards  the  set  of  least  likely  hypotheses  which  contribute  1%  of  the  overall 
probability  mass),  these  algorithms  are  algebraically  identical  to  PDA.  The  slight 
reduction  in  performance  indicates  that  the  deletion  logic  is  harmful  to  the  overall 
performance,  ft  is  also  not  surprising  that  the  performance  of  the  pruning  algorithm 
is  worse  than  that  of  PDA.  A  single-component  pruning  algorithm  is  the  Nearest 
Neighbor  algorithm  described  in  Section  2.5.61,  and  the  poor  performance  of  the 
Nearest  Neighbor  algorithm  compared  to  PDA  has  been  well  documented  [4)139- 
141,  ©373]. 

The  surprising  result  of  Figure  14.271  is  that  the  single-component  ISD  initial¬ 
ization  algorithm  visibly  outperforms  PDA.  In  order  to  verify  this  result,  this  com¬ 
parison  was  repeated  for  1,000  Monte  Carlo  simulations.  The  scatter  plot  for  these 
simulations  is  shown  in  Figure  14.281  In  this  new  set  of  simulations,  the  ISD  ini¬ 
tialization  algorithm  outperformed  the  PDA  algorithm  12.5%  of  the  time,  the  PDA 
algorithm  outperformed  the  ISD  initialization  algorithm  8.8%  of  the  time,  and  the 
the  two  were  essentially  identical  (track  life  within  10  scans  of  each  other)  for  78.7% 
of  the  time.  The  diagram  shows  that  there  is  a  large  concentration  of  simulations  for 
which  the  PDA  performs  slightly  better  than  the  ISD  initialization  algorithm,  but 
the  difference  in  performance  is  less  than  10  scans,  hence  they  are  counted  as  being 
identical.  The  performance  of  the  ISD  initialization  algorithm  is  spread  much  further 
out  towards  the  larger  values  in  Figure  14.281  indicating  that  the  track  life  is  more 
likely  to  be  significantly  longer  than  that  of  the  PDA  algorithm.  The  histograms  of 
the  track  life  of  the  two  algorithms  are  shown  in  Figure  4.291  The  diagram  shows 
that,  while  the  means  of  the  two  track  life  distributions  is  roughly  identical,  the 
ISD  initialization  is  further  skewed  such  that  there  are  more  points  in  the  tail  of  the 
distribution,  representing  significantly  longer  track  life.  Overall,  however,  one  would 
suppose  that  the  mean  performance  is  not  significantly  improved. 
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figure  4.28.  Performance  of  PDA  compared  t< 
ine  a  single  Gaussian  component. 


4-5  Multiple  Targets  in  Clutter 

In  order  to  test  the  performance  of  the  system  tracking  multiple  targets  in 
clutter,  the  system  was  extended  to  the  multiple  target  model  as  presented  in  Section 
12. 51  and  tested  using  a  two-target  scenario.  The  system  was  programmed  to  generate 
joint  association  hypotheses  for  every  joint  association  event,  each  with  the  estimated 
joint  state  of  the  targets,  joint  state  covariance  and  hypothesis  weight,  as  described  in 
Section  2.5.51  While  the  implementation  performed  well  (as  expected)  in  low-clutter 
tracking  conditions,  the  computational  complexity  prevented  any  testing  from  being 
conducted  in  high  clutter  conditions  (where  the  more  efficient  merging  algorithm  is 
beneficial).  This  prevented  any  meaningful  comparison  with  previously  published 
merging  and  pruning  algorithms. 

The  SB-MHT  algorithm  described  in  Section  12.5.101  maintains  separate  sets  of 
single  target  hypotheses  for  each  individual  target,  alongside  listings  of  compatible 
hypotheses  which  can  be  used  to  form  joint  hypotheses  (each  including  one  single 
target  hypothesis  from  each  target’s  list).  Such  a  structure  retains  much  of  the  benefit 
of  the  direct  joint  target  state  representation,  but  for  a  fraction  of  the  memory  and 
computational  cost.  A  multiple  target  extension  of  the  ISD  initialization  algorithm 
using  such  an  architecture  could  be  performed;  however,  due  time  limitations  this 
was  not  attempted. 

4-6  Single  Maneuvering  Target 

As  discussed  in  Section  12.4.21  the  PDF  of  target  state  for  a  single  target  which 
switches  between  different  dynamics  models  at  unknown  time  instants  is  also  a  Gaus¬ 
sian  mixture  in  which  the  number  of  components  grows  exponentially  with  time.  Ac¬ 
cordingly,  this  is  another  problem  to  which  the  ISD  initialization  and  optimization 
algorithms  could  be  applied. 

In  order  to  test  the  performance  of  such  an  implementation,  a  full-order 
Bayesian  filter  was  developed  using  a  Markov  transition  model.  The  development 
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of  the  algorithm  follows  Section  12.4.21  the  structure  of  the  algorithm  is  shown  in 
Figure  2.31  At  the  end  of  each  processing  cycle  the  hypotheses  are  combined  using 
the  ISD  initialization  algorithm  such  that  the  maximum  number  of  hypotheses 
is  not  exceeded.  When  hypotheses  are  merged,  a  different  form  of  transition 
probability  will  be  required,  to  represent  the  probability  that  the  system  transitions 
from  one  of  a  merged  set  of  models  to  a  new  model.  This  calculation  is  described 
in  Appendix  A.3i 

The  algorithm  was  tested  using  a  scenario  adapted  from  [21],  which  simulates 
a  target  flying  on  segments  of  constant  velocity,  in  between  segments  of  constant 
acceleration  turn.  The  state  space  representation  of  the  target  truth  model  is: 


x{k) 


z(k ) 


1 

T 

0 

0 

rp 12 
~2~ 

0 

rj-i  2 

~T 

0 

0 

1 

0 

0 

x{k  —  1)  + 

T 

0 

u(k  —  1)  + 

T 

0 

0 

0 

1 

T 

0 

rp  2 
~2 ~ 

0 

rp  2 
~2~ 

0 

0 

0 

1 

0 

T 

0 

T 

w{k  —  1) 


%x  (&) 
Zy(k) 


10  0  0 
0  0  10 


x(k)  +  v(k) 


(4.2) 


where  T  is  the  time  between  measurement  intervals  (. k  —  1)  and  k,  and  w(k)  and 
v(k)  are  two  independent  zero-mean  white  noise  processes  such  that: 

E{w(k)w(k)T}  =  Q  =  ql 
E{v{k)v{k)T}  =  R  =  rl 

where  the  measurement  noise  covariance  r  =  2000,  the  dynamics  noise  covariance 
q  =  10~4,  and  the  acceleration  input  u(k)  is  as  shown  in  Table  14.41  The  initial 
velocity  of  the  target  is  15  units/sec  in  the  —y  direction. 


4-48 


Sample 

Model 

Acceleration 

1-35 

Constant  velocity 

0 

36-40 

Constant  velocity  plus  input 

[0.5  0.5]t 

41-55 

Constant  velocity 

0 

56-65 

Constant  velocity  plus  input 

[-0.2  —  0.2]T 

66-100 

Constant  velocity 

0 

Table  4.4.  Parameters  for  maneuvering  target  scenario. 


The  filter  bank  consists  of  three  filters,  two  of  which  utilize  constant 
acceleration  models  and  the  other  utilizes  a  constant  velocity  model.  The  constant 
velocity  filter  uses  the  model  described  in  Eq.  (]4.1j),  with  q  =  10-4,  as  per  the  truth 
model.  The  constant  acceleration  models  utilized  the  standard  extension  of  Eq.  (4.11): 
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where,  as  previously,  T  is  the  time  between  measurement  intervals  (. k  —  1)  and  k, 
and  w{k)  and  v(k)  are  two  independent  zero-mean  white  noise  processes  such  that: 


E{w(k)w(k)T}  —  Q  =  ql 
E{v(k)v(k)T}  =  R  =  rl 


One  of  the  constant  acceleration  filters  uses  q  =  10  4  to  handle  the  constant  acceler¬ 
ation  input  once  the  model  has  stabilized  after  the  initial  maneuver  onset;  the  other 
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has  q  =  0.25  to  aid  convergence  at  onset.  The  Markov  transition  model  was  set 
such  that  the  probability  that  the  system  will  stay  in  a  given  model  is  0.98,  and  the 
probability  that  the  system  will  switch  from  a  given  model  to  either  of  the  remaining 
models  is  0.01. 

The  simulations  were  computed  for  50  Monte  Carlo  runs  using  the  Bayesian 
switching  model  approximation,  and  the  IMM.  The  Bayesian  switching  model  ap¬ 
proximation  used  the  ISD  initialization  algorithm  to  combine  the  outputs  of  the 
filters  down  to  three  estimates  at  the  end  of  the  processing  cycle.  Each  of  these 
three  estimates  was  then  processed  using  each  dynamics  model  in  the  following  pro¬ 
cessing  cycle,  similarly  to  the  GPB-2  structure.  The  Root-Mean-Square  (RMS)  error 
of  the  system  using  the  Bayesian  switching  model  approximation  is  shown  in  Figure 
4.301  and  the  RMS  error  of  system  using  the  IMM  is  shown  in  Figure  14.311  The 
results  demonstrate  that  the  performance  of  the  Bayesian  switching  model  approx¬ 
imation  is  worse  than  that  of  the  IMM.  The  reason  for  this  is  that,  as  discussed  in 
Section  3.3.1.3  and  as  illustrated  in  Section  14.21  the  ISD  cost  function  applies  more 
cost  to  components  with  smaller  variance  than  to  those  with  larger  variance.  In 
the  problem  of  data  association,  as  examined  in  previous  sections,  most  of  the  mix¬ 
ture  components  have  covariances  of  similar  magnitude,  hence  this  weighting  is  not 
harmful,  and  at  times  it  can  even  be  beneficial.  In  the  problem  of  switching  mod¬ 
els,  however,  the  covariance  matrices  proposed  by  the  various  models  are  of  vastly 
different  orders  of  magnitude  —  some  proposing  that  the  target  is  travelling  on  a 
regular,  predictable  path,  and  others  proposing  that  the  target  is  exhibiting  a  high- 
jerk  maneuver.  In  this  situation,  the  ISD  initialization  algorithm  will  tend  to  merge 
or  discard  the  more  agile  maneuver  hypotheses,  even  if  they  are  more  probable  than 
the  lower  covariance  non-maneuvering  hypotheses.  This  explains  why  the  error  of 
the  ISD  initialization  system  is  lower  than  the  error  of  the  IMM  in  non-maneuvering 
portions  of  the  simulations,  and  higher  than  the  IMM  at  the  harsh  maneuver  onset, 
as  seen  in  Figures  14.301  and  4.311 
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RMS  Position  Error  and  Filter  Prediction  for  Bayesian  Switching  Approxmation 


Figure  4.30.  RMS  position  and  velocity  error  of  system  utiliz¬ 
ing  Bayesian  switching  model  approximation.  Filter- 
predicted  RMS  error  shown  in  dashed  line. 
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The  result  of  Figure  14.301  was  largely  unmodified  when  nine  hypotheses  were 
retained  rather  than  three,  effectively  lengthening  the  memory  of  the  system  by  a 
further  sample  period.  This  indicates  that  the  limiting  factor  in  the  performance 
of  the  IMM  is  more  the  Markov  transition  model,  rather  than  the  approximation 
applied  to  the  target  state  PDF.  This  suggests  that  altering  the  transition  model, 
perhaps  to  a  time-varying  Markov  model,  could  be  of  great  benefit.  A  suggestion  for 
such  a  model  is  discussed  in  Section  15.41 

4-  7  Summary 

The  major  outcome  of  this  chapter  is  that  the  performance  of  the  ISD  initial¬ 
ization  algorithm  for  tracking  a  single  target  in  clutter  is  significantly  better  than 
that  of  any  of  the  previously  published  methods  tested  in  the  comparison.  Further¬ 
more,  it  was  demonstrated  that  the  performance  of  the  ISD  initialization  algorithm 
increases  exponentially  as  the  number  of  mixture  components  increases,  whereas  ex¬ 
isting  methods  are  unable  to  provide  any  significant  improvement  using  more  than 
25  mixture  components.  Although  no  results  were  obtained  for  the  multiple  target 
tracking  problem,  a  result  similar  to  the  single  target  case  is  likely  if  a  computation¬ 
ally  feasible  extension  is  developed. 
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V.  Conclusions  and  Recommendations 


5. 1  Restatement  of  Research  Goal 

As  stated  in  Section  11.21  the  goal  of  this  study  was  to  develop  a  technique 
of  maintaining  a  high  fidelity  representation  of  the  target  state  Probability  Density 
Function  (PDF)  while  limiting  the  number  of  Gaussian  mixture  components  to  re¬ 
tain  computational  tractability.  The  procedure  defined  a  physically  meaningful  cost 
function  to  measure  the  deviation  from  the  true  target  state  PDF,  and  proceeded 
by  sequentially  selecting  the  simplification  steps  to  minimize  the  cost  of  the  reduc¬ 
tion.  The  performance  of  this  cost  function-based  approximation  was  demonstrated 
in  a  realistic  single  target  tracking  problem,  as  presented  by  Salmond  [44]  ■  These 
simulations  indicate  that  the  track  life  (the  standard  metric  for  comparison  of  such 
algorithms)  achievable  utilizing  the  new  approximation  raises  tracking  performance 
to  a  previously  unattainable  level. 

5.2  Summary  of  Results 

The  results  presented  in  Chapter  [[0  demonstrate  the  performance  of  the  Gaus¬ 
sian  mixture  reduction  algorithm  based  on  the  Integral  Square  Difference  (ISD)  cost 
function.  Section  [4i21  applied  the  initialization  algorithm  to  a  one-dimensional  prob¬ 
lem,  illustrating  the  competency  of  the  reduction  steps  chosen.  Section  14.3  then 
demonstrated  the  refinement  offered  using  the  iterative  optimization  technique. 

5.2.1  Single  Target  Tracking  Performance.  The  results  presented  in  Section 
14.41  reveal  the  significant  improvement  in  performance  offered  by  the  ISD  initializa¬ 
tion  algorithm.  It  was  demonstrated  that,  while  the  performance  of  the  algorithm 
is  no  better  than  previous  techniques  when  fewer  than  10  components  are  utilized, 
when  25  or  more  components  are  used,  the  track  life  performance  is  considerably 
better  than  that  achievable  using  any  of  the  existing  methods  compared.  Further- 
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more,  the  trend  of  the  average  track  life  shown  in  Figures  4,8  and  14.181  suggests  that 
the  increase  in  performance  will  continue  as  the  number  of  components  grows:  there 
is  no  indication  that  the  performance  will  level  out  as  seen  with  the  other  algorithms. 
Accordingly,  the  ISD  initialization  algorithm  not  only  provides  a  level  of  performance 
which  was  previously  unattainable,  but  the  level  of  performance  achievable  using  the 
algorithm  appears  to  be  limited  only  by  the  computational  resources  available.  As 
computational  power  increases,  the  algorithm  has  the  potential  to  extend  the  track 
life  possible  in  a  high  clutter  environment  far  beyond  that  provided  by  any  previous 
algorithm. 

5.2.2  Multiple  Target  Tracking  Performance.  The  application  of  the  ISD 
initialization  algorithm  to  the  multiple  target  tracking  problem  revealed  the  excessive 
computational  complexity  of  the  methodology  used  in  the  multiple  target  extension. 
As  discussed  below  in  Section  15.41  an  MHT-like  extension  of  the  hypothesis  creation 
algorithm  which  maintains  separate  lists  of  hypotheses  for  each  target,  alongside  a 
list  of  joint  hypotheses  linking  the  single-target  hypotheses  together,  would  allow  the 
ISD  initialization  technique  to  be  applied  to  the  multiple  target  tracking  problem, 
providing  a  similar  performance  benefit  to  the  single  target  case,  but  with  a  more 
modest  computational  load. 

5.2.3  Maneuvering  Target  Tracking  Performance.  The  results  presented  in 
Section  14.6  demonstrate  that  the  ISD  cost  function  is  not  appropriate  for  use  with 
the  problem  of  switching  dynamics  models  due  to  the  large  variability  of  the  covari¬ 
ance  in  each  model.  The  results  also  appeared  to  indicate  that  the  performance  in 
the  scenario  is  limited  more  by  the  Markov  transition  model  than  the  PDF  repre¬ 
sentation.  An  extension  to  a  time-varying  Markov  transition  model  is  suggested  in 
Section  I5MI 
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5.3  Significant  Contributions  of  Research 

The  Multiple  Hypothesis  Tracking  (MHT)  technique  represents  the  state-of- 
the-art  tracking  algorithm  in  modern  civilian  and  military  radar  systems.  However, 
the  common  implementation  relies  on  simplistic  ad  hoc  pruning  and  merging  tech¬ 
niques  to  perform  the  most  vital  function  of  the  algorithm:  hypothesis  control.  This 
thesis  directly  addresses  the  problem  of  hypothesis  control,  making  several  important 
contributions,  including  those  listed  in  the  following  pages. 

1.  The  Integral  Square  Difference  (ISD)  cost  function  defined  in  Section^. 3. 1.3|is 
both  physically  meaningful  and  computationally  tractable ;  this  latter  attribute 
was  seen  to  be  rare  among  common  cost  function  selections.  By  developing  a 
cost  function  which  can  be  evaluated  in  closed  form,  the  resultant  reduction 
algorithm  is  able  to  consider  the  impact  of  a  merging  or  pruning  operation 
on  the  entire  mixture ,  rather  than  individual  components  or  component  pairs, 
leading  to  a  remarkable  improvement  in  tracking  performance. 

2.  Apart  from  being  able  to  be  evaluated  in  closed  form,  the  ISD  cost  function 
is  also  continuously  differentiable,  and  its  first  derivatives  are  also  able  to  be 
evaluated  in  closed  form  using  standard  vector-matrix  notation.  This  leads 
to  an  easy  application  of  iterative  optimization  methods  as  described  in  Sec¬ 
tion  3.3.31  which  have  not  previously  been  applied  to  the  Gaussian  mixture 
reduction  problem.  Although  the  simulation  results  presented  in  Section  14.4.41 
indicate  that  the  improvement  gained  over  the  initialization  algorithm  is  negli¬ 
gible  for  the  target  tracking  problem,  it  remains  a  valuable  concept  which  may 
be  beneficial  in  other  applications. 

3.  The  tracking  performance  of  the  ISD  initialization  algorithm  presented  in  Sec¬ 
tion  14.41  demonstrates  the  benefit  of  the  cost  function-based  technique.  For 
larger  numbers  of  mixture  components,  the  performance  of  the  ISD  initializa¬ 
tion  algorithm  is  significantly  greater  than  that  of  previously  published  tech- 
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niques.  The  trend  illustrated  in  Figure [4781  indicates  that,  in  the  problem  under 
consideration,  the  performance  achieved  using  the  ISD  initialization  algorithm 
with  30  mixture  components  is  greater  than  attainable  with  existing  algorithms 
using  any  feasible  number  of  components.  Furthermore,  as  the  computational 
power  available  increases,  the  algorithm  is  capable  of  providing  a  level  of  per¬ 
formance  that  increases  exponentially  with  the  number  of  mixture  components, 
whereas  the  previously  proposed  algorithms  are  unable  to  improve  performance 
beyond  that  achieved  using  25  components. 

4.  The  significance  of  the  Maximum  Likelihood  cost  function  proposed  in  Section 
3.3.1.4  should  not  be  overlooked.  Although  this  function  does  not  lead  to  a 
tractable  implementation,  its  physical  interpretation  as  the  “goodness  of  fit” 
of  the  reduced-complexity  PDF  to  the  full  PDF  (as  derived  in  Section  13.3. 1.4[) 
distinguishes  it  as  possibly  the  most  physically  meaningful  cost  function  of 
which  one  could  conceive.  Approximations  to  this  cost  function  may  be  able 
to  yield  a  significant  alternative  to  the  more  mathematically  tractable  ISD 
technique  developed  herein. 

5.  The  tutorial  on  existing  data  association  algorithms  presented  in  Section  12151 
differs  significantly  from  previous  presentations  (such  as  those  in  [2j  3]),  and 
provides  a  clear  understanding  of  the  approximations  inherent  to  the  algo¬ 
rithms,  and  the  resultant  strengths  and  weaknesses. 

6.  The  examination  of  the  bias  and  coalescence  problems  of  the  JPDA  and  CPDA 
algorithms  presented  in  Section  13.21  reveals  new  insight  into  the  cause  of  the 
difficulties  commonly  experienced  with  these  techniques.  In  Eqs.  (3.2])-(j3.7l) 
it  is  proven  that  JPDA  is  in  fact  unbiased,  which  is  in  direct  contradiction 
to  the  analyses  presented  in  |I61  [191  122],  The  thorough  explanation  of  the 
poor  performance  of  CPDA  as  compared  with  JPDA  expands  and  corrects  the 
previous  theory,  as  published  in  [12], 
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7.  Finally,  the  efficient  method  of  evaluating  a  multivariate  Gaussian  PDF  out¬ 
lined  in  Section  13.3.4.1  and  the  two-stage  gating  procedure  described  in  Ap¬ 
pendix  IA.2I  both  provide  major  computational  savings,  and  are  applicable  to 
a  wide  range  of  scientific  computation  applications.  To  the  knowledge  of  the 
author,  neither  of  these  developments  has  been  previously  published. 

5-4  Recommendations  for  Future  Investigations 

While  the  ISD  initialization  algorithm  proposed  in  Section  [3.3.41  provides  a  sub¬ 
stantial  increase  in  performance  over  existing  methods,  the  computational  complex¬ 
ity  of  the  technique  will  be  of  significant  concern  for  any  practical  implementation. 
An  important  area  for  future  investigation  is  to  examine  computational  enhance¬ 
ments  of  the  algorithm.  For  example,  the  current  implementation  considers  the  cost 
for  every  possible  action  at  each  step,  selecting  only  a  single  action.  From  the  begin¬ 
ning  of  the  reduction  process,  it  will  commonly  be  clear  that  many  possible  actions 
are  not  worth  considering,  and  thus  the  computational  load  of  the  algorithm  could 
be  reduced  considerably  by  neglecting  such  options. 

As  discussed  in  the  previous  section,  the  Maximum  Likelihood  measure  de¬ 
rived  in  Section  13.3.1.41  is  probably  the  most  physically  meaningful  cost  function 
for  this  application.  Although  the  ISD  cost  function  was  chosen  for  its  tractabil- 
ity,  its  predisposition  toward  neglecting  higher  variance  components  was  clear,  and 
this  characteristic  was  demonstrated  to  make  the  function  inappropriate  for  some 
applications.  There  is  potential  for  significant  improvement  in  performance  through 
development  of  techniques  based  on  the  Maximum  Likelihood  measure  or  approxi¬ 
mations  thereof. 

The  results  presented  in  Section  14141 clearly  demonstrate  the  performance  of  the 
ISD  initialization  algorithm  in  a  single  target  tracking  problem.  The  method  adopted 
for  a  multiple  target  implementation,  directly  forming  and  merging  joint  hypotheses, 
led  to  a  structure  which  was  computationally  untenable,  preventing  generation  of  any 


5-5 


meaningful  results  (discussed  in  Section  14751) .  The  extension  of  the  Gaussian  mixture 
reduction  algorithm  to  a  multiple  target  scenario  while  maintaining  links  between 
compatible  single  target  hypotheses  (as  opposed  to  the  target  PDF  marginalization 
inherent  to  the  extension  proposed  by  Pao  [38],  discussed  in  Sect  ion  )2 .5. 1 Q)  remains 
a  significant  area  of  research. 

The  application  of  the  ISD  initialization  algorithm  to  the  problem  of  switching 
target  dynamics  models  demonstrated  that  the  ISD  cost  function  was  inappropriate 
for  this  application,  and  that  the  time  invariant  Markov  transition  model  was  quite 
potentially  the  more  important  limitation  on  the  performance  of  the  system.  As 
mentioned  briefly  in  Section  I2.4.21  the  use  of  the  Markov  model  assumes  that  transi¬ 
tion  probabilities  depend  only  on  the  previous  model  index,  and  not  on  prior  model 
histories  or  prior  measurements.  These  assumptions  are  applied  in  the  development 
of  Eq.  (2.39b  in  which  the  model  switching  probabilities,  which  naturally  depend  on 
the  entire  model  history  and  measurement  history,  are  assumed  to  depend  only  on 
the  previous  model  index.  A  simple  variant  of  these  assumptions  would  be  to  allow 
dependence  of  the  Markov  transition  probabilities  on  recent  measurements,  hence 
creating  a  time  varying  Markov  model  which  adapts  itself  as  observations  are  re¬ 
ceived.  One  idea  for  such  a  structure  would  be  to  adjust  the  transition  probabilities 
according  to  the  properties  of  the  residuals  for  each  of  the  filters  in  recent  processing 
cycles.  If  one  filter  is  clearly  dominant,  then  the  transition  probabilities  can  be  ad¬ 
justed  accordingly  to  use  this  filter  almost  exclusively  in  the  estimator  output,  and 
to  reinitialize  other  filters  continually  using  this  estimate.  If  the  residual  properties 
of  all  filters  are  similar,  or  if  the  model  with  the  smallest  residual  changes,  then  the 
transition  probabilities  corresponding  to  a  change  of  model  could  be  increased  to 
respond  to  this  uncertainty.  Using  a  structure  based  on  the  Sequential  Probability 
Ratio  Test  (SPRT)  or  the  extensions  discussed  in  [54],  the  transition  probabilities 
could  be  increased  whenever  the  most  recent  residuals  indicate  that,  to  within  a 
certain  confidence  level,  the  model  in  force  is  changing.  Such  a  development  could 
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enhance  both  the  steady  state  performance  of  the  system,  and  the  speed  with  which 
the  system  responds  to  the  onset  of  a  maneuver. 
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Appendix  A.  Derivations 


A.l  Product  of  Two  Gaussians  of  Same  Dimension 

This  section  develops  a  simplification  of  the  result  of  the  product  of  two  multi¬ 
variate  Gaussian  PDFs  of  the  same  dimension.  The  result  presented  herein  is  utilized 
throughout  Chapter  ITIT1 

The  variable  of  both  PDFs  is  denoted  x\  the  first  Gaussian  has  mean  pbl  and 
covariance  P1;  while  the  second  has  mean  /x2  and  covariance  P2.  Writing  the  product 
in  full: 


Af{x]  /tj,  P i}jV{a;;  /z2,  P2} 

=  |27rP i |  — ^  exp  {-§(*  -  H\)TP i_1(*  -  nA}  ■ 

|27tP2|"5  exp  {-\{x  -  fi2)T P2_1(^  -  M2)} 
=  |27rPi  I —  2 1 27rP2 1 —  2  - 


exp  (x  —  Pi)TP i  \x  -  vA  +  (x  -  h2)tP2  \x-fi2)  | 


(A.l) 


Manipulating  the  exponents: 

(*  -  PAT Pi_1(*  -  Mi)  +  (*  -  M2)Tp2_1(*  -  M2) 

=  xTP\~lx  —  2£cTPi_1/i1  +  /x1TP1_1yLt1 

+  x1P2~lx  -  2xtP 2_1//2  +  /i2rP2'V2 

=  *T(Pi_1  +  PW1)^  -  2icT(PrVi  +  p2_1M2)  +  MiT  pi_Vi  +  /V  p2“V2 

(A.2) 

Examining  the  form  of  Eq.  (]A.2jh  we  see  that  the  resulting  function  will  be  a 
Gaussian  PDF  with  a  scaled  volume.  Denoting  /i3  and  P3  as  the  mean  and  covariance 
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of  the  resultant  Gaussian,  and  a  as  the  volume  scaling  factor,  we  seek  to  fit  Eq.  (lA.lj) 
into  the  form: 


aAf{x]  p3,P3}  =  a|27rP3|  s  exp  p3)TP3  ^a:  -  p3)}  (A.3) 

where  the  exponent  expands  to: 

(x  -  p3)7  P3_1(^  -  p3)  =  xTP3~lx  -  2xTP 3_1/x3  +  p3J  P3-V3  (A.4) 

Matching  the  coefficients  of  the  terms  in  Eq.  (]A.2j)  to  those  in  Eq.  (]A.4D,  we  hnd: 
x1P3~lx  =  xT (Pi^1  -\-P2~1)x\/  x 

P31  =  p1-1  +  p2-1 
.  .  p3  =  (Pf'  +  Po't1 

=  Pr  -  Pi(Pi  +  P2)”1Pi  =  P2  -  P2(Pi  +  P2)'1P2  (A. 5) 

where  the  hnal  equality  is  due  to  the  matrix  inversion  lemma  [341213].  Similarly, 
matching  the  x  coefficients  and  using  the  result  of  Eq.  (1A.5I): 

2xTP3~1  p3  =  2xt(Pi ~xp3  +  P2 _1/x2)  V  x 

■  P3  =  P3(Pi  V1  +  P2  1  M2) 

—  P3P1  1Ati  +  P3P2  'aG 

=  [Pi  -  P!(P!  +  P2)~1P1]P1~1^1  +  [P2  -  P2 (Pi  +  P2)-1P2]P2-1m2 
=  Mi  +  p2  —  Pi(Pi  +  P2)  1  Pi  —  P2(Pi  +  P2)  1  p2  (A. 6) 
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From  Eqs.  (A.5J)  and  (]A.6f)  we  can  expand  the  final  term  in  Eq.  (1  A.4f) : 


M3TF*3  V 3 


(Ml'Pf1  +  M2TP2-1)P3P3'1P3(PrVl  +  P2-V2) 

mi7  pr'PsPrVi  +  2/x1Tpr1p3P2~V2  +  m27  P2_iP3P2_V2 

^pr'pi  -  Pi(Pi  +  p2)-1p1]p1-Vi  +  2/x1Tpr1p3p2-V2 
+  H2TP21[P 2  ~  P2(Pr  +  P2)“1P2]P2'V2 
Vl1  PrVl  -  MlT(Pl  +  P2)-Vl  +  ^2Tp2_1M2  -  M2T(Pl  +  P2rV2 
+  2/j,1iPi  1P3P2  V2 

Mi7  PrVi  +  M2Tp2~V2  -  (mi  -  /VT(pi  +  P2)-1Vi  -  m2) 

+  2/i,1iPi  1P3P2  1/x2  —  2/x1T(P1  +  P2)  V2 

Mi7  PrVi  +  m2tP2_V2  -  (mi  -  m2)t(p  1  +  p2)_1  Vi  -  m2) 

+  2MlT[Pr1P3P2-1  -  (Pi  +  P2)-1]m2  (A. 7) 


Manipulating  the  weighting  matrix  on  the  cross-term: 

pr1p3P2-1-(Pi  +  p2r1 
=  PVpl  -  Pl(Pl  +  P2)'1Pl]P2^1  -  (Pi  +  P2)-1 
=  P2_ 1  -  (Pi  +  P2)"1PlP2"1  -  (Pi  +  P2)_ 1 
=  (Pi + p2)-i(Pi + p2)p2“1  -  (Pi + p2rlPiP2-1  -  (Pi + P2) — 1 
=  (P3  +  P2)”1[(Pi  +  P2)P2"1  -  P1P2"1  - 1] 

=  (p1+p2)-i[p1p2-i+i-p1p2-i-i] 

=  (Px  +  P2)~1[0] 

=  0  (A. 8) 

Hence  substituting  Eq.  (1A.8I)  into  Eq.  (|A.7f): 

M3Tp3_V3  =  MiTPrVi  +  M2TP2"V2  -  Vi  -  m2)t(p  1  +  P2)'1Vi  -  M2)  (A. 9) 
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Finally  equating  the  expressions  in  Eqs.  (jA. If)  and  (1A.3D : 


A/'jar;  /x1?  Px}A/'{aj;  /x2,  P2}  =  oiN{x]  fj,3,  P3}  (A.10) 


Expanding  each  side  of  the  expression: 

LHS  =  | 27tPi  |  ~ 5  |  2ttP2 |  ~ 5  • 

exp  {-i  [xtP3~1x  -  2xTP3~1  n3  +  Vi1  Pi"Vi  +  M2TP2_V2]  } 

RHS  =  a|27rP3|-^  • 

exp  {  \  [£ctP3_1x  -  2^TP:rV3  +  [ixl  PrVi  +  /x2TP2~ V2 

-  -  /x2)r(P1  +  P,)-1^!  -  uh)] }  (A.11) 


Using  the  last  remaining  variable  a  to  satisfy  the  equality  of  Eqs.  (lA.IOj)  and  (lA.llj): 


a  = 


n 

|2vtP3| 

1  |2ttPi| 

|2ttP2| 

exp 


{-I  (Mi  -  R2)T(Pi +  P2)  1(^1  —  M2)  }  (A. 12) 


where: 


n 

|2ttP3| 

1  |2ttPi  1 

|2ttP2| 

A/|27rPi||27rP3|~1|27rP2| 

\/|2vtPiP3  1P2| 
v/|27rP1(Pr1  +  P21)P2| 
v/|27r(PlPr‘P2  +  P,P21P2)| 
V/\2n(P1+P2)f1 


Hence: 


A/’{*;/i1,Pi}A/’{*;/i2,P2}  =  aAf{x;n3,  P3}  (A.13) 
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where: 


p3  =  (pr'  +  p,-1)-1 
=  p1-p1(p1  +  p2)-1p1 
=  p2-p2(p1  +  p2)-1p2 

M3  =  Pa(Pi  XMi  +  P2  *M2) 

—  Mi  +  M2  —  Pi(Pi  +  P2)  XMi  —  P2(Pi  +  P2)  1M2 

a  =  |27r(Pi  +  P2)|_5  exp|-i(/i1-/^2)T(P1  +  P2)_1(/x1-/x2)| 

—  M2>  Pi  +  P2} 


Considering  the  special  case  where  Mi  =  M2  =  M  and  Pi  =  P2  =  P: 


P3 

Ms 


(P-1  +  P'1)"1  =  (2P-1)"1  =  iP 

P3(P1m  +  P'V)  =  |P(2P-1m)  =  M 

|2tt(P  +  P)|_2  exp  |  |  (m~m)T( 2P)_1(m-m)  } 
|47tP|_5 


(A.  14) 


Hence: 

[J\f{x]  m,  P}]2  =  |47rP|_5A/"{ai; /^,  iP}  (A. 15) 


Modified  Gating  Algorithm 

The  measurement  gating  algorithm  described  in  Section  12 .5 .11  centers  on  the 
following  calculation: 

[Zj(k)  -  Zi(k\k  -  1)]TS i{k)~l[zj{k)  -  zt(k\k  -  1)]  <  7  (A. 16) 

where  zfk)  is  the  j-th  measurement  in  the  fc-th  scan,  Zi(k\k  —  1)  is  the  predicted 
measurement  for  the  i-th  hypothesis,  and  S i(k)  is  the  covariance  of  the  residual 
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for  hypothesis  i  formed  with  the  target-originated  measurement.  This  expression 
requires  the  calculation  of  the  difference  of  two  vectors,  followed  by  the  multiplication 
of  a  matrix  by  a  vector,  and  finally  the  inner  product  of  two  vectors.  For  an  N- 
dimensional  measurement  space,  the  first  operation  will  require  N  additions,  the 
second  operation  will  require  N 2  multiplications  and  N(N  —  1)  additions,  and  the 
final  operation  will  require  N  multiplications  and  (N  —  1)  additions.  This  totals 
( N 2  +  N)  multiplications  and  ( N 2  +  N  —  1)  additions. 1  While  this  may  seem  a  small 
number,  these  calculations  must  be  repeated  for  every  pairing  of  hypothesis  and 
measurement.  The  matrix  inversion  is  not  included  in  the  calculation  as  this  needs 
to  be  performed  only  once  for  each  hypothesis;  it  does  not  need  to  be  repeated  for 
each  measurement  considered.  As  described  in  Section  14.4.21  the  region  populated 
by  clutter  measurements  can  contain  on  the  order  of  48,000  measurements  for  the 
latter  simulations,  and  the  algorithms  being  tested  maintain  up  to  35  hypotheses 
between  processing  intervals,  hence  these  calculations  must  be  performed  1,680,000 
times  (on  average)  in  each  processing  cycle. 


As  illustrated  in  Figure  lAAIa),  the  gating  procedure  described  by  Eq.  (A.  16) 
determines  whether  or  not  a  given  measurement  is  within  an  ellipse  that  is  centered 
on  the  measurement  prediction  Zi(k\k  —  1),  and  with  major  and  minor  axis  and 
orientation  that  are  determined  by  the  residual  covariance  S i(k).  The  idea  of  the 
following  development  is  to  form  a  square  which  is  aligned  with  the  coordinate  axes 
and  completely  encloses  the  ellipse  such  that  if  a  measurement  is  outside  of  the 
square  it  can  be  discarded  without  performing  the  calculation  in  Eq.  (A.  16).  To 
determine  whether  or  not  a  measurement  lies  within  a  square  requires  only  2 N  logical 
comparisons,  hence  avoiding  the  complex  calculations  described  previously.  The 
calculation  of  Eq.  (A.  16)  can  then  be  performed  for  the  relatively  small  number  of 


lrThis  may  be  reduced  somewhat  by  exploiting  the  symmetry  of  the  covariance  matrix,  as  utilized 
in  Section  [3.3.41 
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measurements  which  are  found  to  be  within  the  enclosing  square.  This  is  illustrated 
in  Figure  lA.l(c). 

Dividing  both  sides  of  Eq.  (A,  16)  by  7,  we  obtain  the  following  equation: 

[zj(k)  -  Zi(k\k  -  1)]t[7s i(k)}^1[zj(k)  -  Zi(k\k  -1)]<1  (A. 17) 


Following  from  [521335-336],  the  major  and  minor  axes  of  this  ellipse  will  be  the 
square  roots  of  the  eigenvalues  of  7S i(k),  and  the  orientation  of  the  axes  will  be  in 
the  directions  of  the  corresponding  eigenvectors.  If  a  circle  is  drawn  centered  on  the 
measurement  prediction  (zi(k\k  —  1))  with  a  radius  of  the  square  root  maximum 
eigenvalue^  of  7S i(k)  (denoted  \/Ai),  then  this  will  be  the  smallest  circle  which 
encloses  the  gating  ellipse.  This  is  illustrated  in  Figure  lA.l(b).  It  is  then  an  easy 
matter  to  form  the  square  which  encloses  the  circle,  and  is  aligned  to  the  coordinate 
axes,  as  illustrated  in  Figure  |A.  1(c).  The  square  will  be  centered  on  the  measurement 
prediction  (as  was  the  circle),  and  will  have  a  side  of  The  gating  operation 

can  thus  be  performed  first  using  this  square,  avoiding  the  calculation  of  Eq.  (A.  16) 
for  the  vast  majority  of  the  measurements,  providing  a  major  computational  saving. 


A. 3  Switching  Bayesian  Transition  Probability 

The  switching  model  estimator  approximation  discussed  in  Section  14.61  imple¬ 
ments  the  structure  of  the  full  switching  Bayesian  algorithm  shown  in  Figure  2.3[ 
employing  the  ISD  initialization  algorithm  to  combine  hypotheses  at  the  end  of  each 
processing  cycle.  It  is  quite  possible  that  estimates  arising  from  different  models  in 
the  latest  processing  interval  will  be  merged  in  the  hypothesis  reduction  process.  I11 
order  to  propagate  these  estimates  to  the  following  processing  interval,  a  different 
form  of  transition  probability  will  be  necessary.  For  example,  if  the  estimates  from 

2Wliile  calculation  of  the  eigenvalues  of  a  matrix  is  itself  a  computationally  demanding  operation, 
this  will  only  need  to  be  performed  once  for  each  hypothesis,  not  for  each  measurement,  hence  the 
computational  burden  associated  with  it  is  not  of  concern. 
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+  Measurements 
•  Target 


Figure  A.l.  Measurement  gating:  the  gating  equation  describes  an 
ellipse  as  shown  in  (a);  the  smallest  circle  enclosing  the 
ellipse  is  shown  in  (b);  the  square  aligned  with  coordinate 
axes  enclosing  the  circle  is  shown  in  (c). 
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models  1  and  2  in  the  (k  —  l)-th  processing  cycle  are  combined  in  the  hypothesis 
reduction  process,  then  the  transition  probability  required  to  weight  the  model  esti¬ 
mates  in  the  k-th  processing  cycle  will  be  P  {Mk]\Mk_ii  U  Mk_k)2 },  rather  than  the 
standard  Markov  transition  probability  P{Mkj\Mk-\,i}-  To  see  the  source  of  this 
modified  form,  consider  the  expression  in  which  the  transition  probably  first  arose, 
Eq.  Q2M): 

P{Mk,l\Zk}  =  P{Mk'l\z(k),Zk-1} 
f{Mk\z(k)\Zk~1} 
f{z{k)  IZ*-1} 

f{z(k)\Mk’1 ,  Zk~1}P{Mk,l\Zk~1} 

fumzx-1} 

f{z(k)\Mk\Zk-l}P{Mkj1Mk-1’l'\Zk-1} 
f{z(k)\Mk\  Zk~1}P{Mk:j\Mk~1,l\  Zfc-1}P{Mfc-1T|Zfc-1} 

Hmiz*-1} 

(A.18) 

The  modihcation  due  to  merging  of  models  commences  from  the  second-last  line  of 
Eq.  (1A.18J).  If  hypotheses  are  merged,  then  the  latter  term  in  the  numerator  will 
become  P{Mkj,  U  Mk~1,1'2 \Zk~1}1  which  can  be  expanded  as:3 

P{MkJ ,  Mk~l,Vl  U  Mk~1,l'2\Zk~1} 

=  P{Mkj\Mk-1 U  Mk~1,1'2,  Zk~1}P{Mk~1,1'1  U  Mk~l'l'2\Zk~1} 

=  P{MkJ\Mk_1A ,  Mk~2’1”  U  Mk_ i,2,  Mk~2’1'2 ,  Zk~l}P{Mk~ u'i  U  Mk~1)l'2\Zk~1} 

=  P{Mkij\Mk_hl  UMh)2}P{Mw/‘  U  Mk-u'2\Zk~1}  (A. 19) 

3For  this  example,  the  two  model  history  events  which  are  to  be  merged,  and  Mk _1,b) 

are  assumed  to  contain  consist  of  models  1  and  2  respectively  in  the  most  recent  entry  (events 
Mfc-ip  and  Mk~  1,2),  alongside  the  previous  model  history  events  Mfe-2,h  and  Mk~ 1,z2  . 
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where  the  final  step  is  due  to  the  Markov  assumption,  as  previously  applied  in 
Eq.  (|2.39j),  and  U Mk~1,l'2\Zk~1}  is  the  combined  probability  weight  of  the 

merged  models: 


P{Mk "1>,i  U  M 


k-l,l'2\ 


rk— 1 


}  =  P{Mk~1,l'1\Zk~1}  +  P{M 


k — 1  ,li)  I  ryk— 1 


}  (A. 20) 


This  latter  step  is  possible  because  all  model  history  events  are  disjoint. 

To  evaluate  the  modified  transition  probability,  consider  the  alternative  expan¬ 
sion  of  Eq.  (A.19j) : 

P{MkJ,  Mk~1,1'1  U  Mk~x^ \Zk~x) 

=  P{Mkj,  Mk~^  \Zk~x}  +  P{MkJl  Mk-x^\Zk~x} 

=  P{Mkd\Mk~1^,  Zfc-1}P{Mfc-1’,i|Zfc“1}  + 

+  P{Mkj\Mk~1’l*,Zk-1}P{Mk-1'l*\Zk-1} 

=  P{MkJ\Mk-ltl,  Mk~2,1" ,  Zk~1}P{Mk~1,l'1\Zk~1}  + 

+  P{Mkj\Mk_ij2,  Mfc_2,z2 ,  Zk~1}P{Mk~1,l'2\Zk~1} 

=  P  {Mkj\Mk_ii}P  {Mk^1,l'1\Zk~1}  +  PiMkjM^PiM^1’1^-1} 

(A.21) 

where  P{Mk~1,l'1\Zk~1}  and  P{Mk~1,l’2\Zk _1}  are  the  probabilities  of  the  two  hy¬ 
potheses  to  be  merged  before  merging.  Equating  the  expressions  of  Eqs.  (A.  19])  and 
(IA.21j).  we  obtain: 

P{MkJ ,  U  Mk~1,l'2\Zk~1} 

=  P{Mkj\Mk_i  ±  U  Mk_x^}P{Mk~1^  U  Mk~x&\Zk-'} 

=  P{Mk!j\Mk_iti}P{Mk~1,l'1\Zk~1}  +  P{Mkd\Mk_l!2}P{Mk-1’l*\Zk-1} 

(A. 22) 
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thus: 


P{Mk,j\Mk-\j_  U  Mk- 1,2} 


P{Mk~x'li  U  Mk~1’l2\Zk~1} 

P  {Mktj\Mk-it2}P  {Mk~1’l'2\Zk~1} 
P{Mk~1’li  U  Mk~1’l2\Zk~1} 

(A. 23) 


Considering  the  definition  of  the  denominator  of  Eq.  (A.23)  in  Eq.  (1A.20).  the 
result  in  Eq.  (1A.23)  can  be  seen  to  be  a  weighted  sum  of  the  transition  probabilities 
from  the  models  corresponding  to  the  merged  hypotheses  to  the  new  model  under 
consideration.  The  weights  for  this  sum  are  simply  the  probabilities  of  the  original 
hypotheses  that  were  merged  together. 
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Appendix  B.  Matrix  Reference  Manual 

The  following  pages  contain  a  reproduction  of  the  world-wide  web  page  entitled  “Ma¬ 
trix  Reference  Manual:  Matrix  Calculus” ,  maintained  by  Mr  Mike  Brooks  of  Imperial 
College,  University  of  London.  The  Uniform  Resource  Locator  (URL)  for  the  page 
is  http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html.  Many  thanks  go  to 
Mr  Mike  Brookes  for  giving  permission  for  the  document  to  be  reproduced  in  this 
thesis. 
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Matrix  Reference  Manual 

Matrix  Calculus 


Go  to:  Introduction,  Notation,  Index 


Contents  of  Calculus  Section 

•  Notation 

•  Derivatives  of  Linear,  Quadratic  and  Cubic  Products 

•  Derivatives  of  Inversely  Trace  and  Determinant 

•  Jacobians  and  Hessian  matrices 


Notation 

•  d/dx  (y)  is  a  vector  whose  (i)  element  is  dy(i)/dx 

•  d/dx  (y)  is  a  vector  whose  (i)  element  is  dy/dx(i) 

•  d/dx  (y  )  is  a  matrix  whose  (i,j)  element  is  dy(j)/dx(i) 

•  d/dx  (Y)  is  a  matrix  whose  (i,j)  element  is  dy(i,j)/dx 

•  d/dX  (y)  is  a  matrix  whose  (i,j)  element  is  dy/dx(i,j) 

•  xR  and  Xj  are  the  real  and  imaginary  parts  of  x 

•  x*  is  the  complex  conjugate  of  x 

•  j  is  the  square  root  of  - 1 

An  expression,  y,  can  only  differentiated  with  respect  to  a  complex  x  if  it  satisfies  the  Cauchy-Riemann 
equations:  dy/dxR  =j  dy/dxj .  Expressions  involving  the  complex  conjugate  or  Hermitian  transpose  do 

not  normally  satisfy  this  requirement,  so  separate  expressions  for  dy/dxR  and  dy/dxj  are  given  in  these 
cases. 

In  the  expressions  below  matrices  and  vectors  A,  B,  C  do  not  depend  on  X. 

Derivatives  of  Linear  Products 

•  d/dx  (AYB)  =A  *  d/dx  (Y)  *  B 

o  d/dx  (Ay)  =A  *  d/dx  (y) 

•  d/dx  (x  A)  =  A 

T 

o  d/dx  (x  )  =1 

o  d/dx  (x  a)  =  d/dx  (a  x)  =  a 

•  d/dX(  arXb)  =  abr 

o  d/dX  (a^Xa)  =  d/dX  (aTXTa)  =  aa7 


http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html 
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•  d/dX  (aTXTb)  =  baT 

•  d/dx  (YZ)  =Y  *  d/dx  (Z)  +  d/dx  (Y)  *  Z 

•  dy/dxR  (Y77)  =  (  dy/dxR  (Y)  )H 

•  dy/dxj  (Y77)  =  (  dy/dxj  (Y)  )H 

•  dy/dxR  (x77A)  =  A 

o  dy/dxR  (x77)  =  I 

•  dy/dxj  (x77A)  =  -jA 

o  dy/dxj  (x77)  =  -ji 


Derivatives  of  Quadratic  Products 

•  d/dx  (Ax+b)rC(Dx+e)  =  ArC(Dx+e)  +  D7C7(Ax+b) 

o  d/dx  (xrCx)  =  (C+Cr)x 

■  [C=Cr]:  dldx  (x^Cx)  =  2Cx 

■  d/dx  (x  x)  =  2x 

o  d/dx  (Ax+b)r  (Dx+e)  =  A  7  (Dx+e)  +  D  7  (Ax+b) 

■  d/dx  (Ax+b)7,  (Ax+b)  =  2A  7  (Ax+b) 

o  [C=Cr]:  dldx  (Ax+b)rC(Ax+b)  =  2ArC(Ax+b) 

•  d/dX  (arXrXb)  =  X(abr  +  bar) 

o  d/dX  (arXrXa)  =  2Xaar 

•  d/dX  {aTXTCXb)  =  CTXabT  +  CXbaT 

o  d/dX  (aTXTCXa)  =  (C  +  CT)XaaT 
o  [C=Cr]  d/dX  (aTXTCXa)  =  2CXaar 

•  d/dX  ((Xa+b) TC (Xa+b))  =  (C+Cr)(Xa+b)ar 

•  d/dxR  (Ax+b)77C(Dx+e)  =  A77C(Dx+e)  +  DrCr(Ax+b)* 

o  d/dxR  (x77Cx)  =  Cx+C7x*  =  Cx+(x77C)7 

■  [C=C7]:  d/dxR  (x77Cx)  =  2CxR 

•  [C=C7/] :  d/dxR  (x77Cx)  =  2(Cx)^ 

■  d/dxR  (x77x)  =  2xr 

•  dldx1  (Ax+b)77C(Dx+e)  =  j(  D7C7(Ax+b)!-A/7C(Dx+e) ) 

o  dldx1  (x77Cx)  =j{ CTx*  -  Cx)  =j(  (x77C)7  -  Cx  ) 

■  [C=C7]:  dklxj  (x77Cx)  =  2Cx/ 

■  [C=C7/] :  dUhj  (x77Cx)  =  2(Cx)/ 

■  d/dxR  (x77x)  =  2xj 


Derivatives  of  Cubic  Products 

•  d/dx  (x7  Axx7)  =  (A+A 7)xx7+x  7  Ax I 
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Derivatives  of  Inverses 

•  d/dx  ( Y“ 1 )  =  -  Y“ 1  d/dx  (Y)Y'1  [2.1] 

•  d/dX  (a7X_lb)  =  -X'TabTX-T  [2.6] 

Derivative  of  Trace 


Note:  matrix  dimensions  must  result  in  an  n*n  argument  for  tr(). 

•  d/dX  ftr(X))  =  d/dX  (tr(X7))  =  I  [M] 

•  d/dX  (tr(X7))  =k(XkA)T 

•  d/dX  (tr(AX*))  =  SUMr=0  ./t_1(X/AX^r'1)r 

•  d/dX  (tr(AX_1  B))  =  -(X_IBAX-1)7  =  -(X^ABX'7)  [2.5] 

o  d/dX  (tr(AX"'))  =d/dX  (tr(X_1A))  =  -X~T\rX~T 

•  d/dX  (tr(AyXB7))  =  d/dX  (tr(BX7A))  =  AB  [2A] 

o  d/dX  (tr(XAr))  =  d/dX  (tr(ArX))  =d/dX  (tr(XrA))  =  d/dX  (tr(AXr))  =  A 

•  d/dX  (tr(AXBXrC))  =  ATCTXBT  +  CAXB 

o  d/dX  (tr(XAXr))  =  d/dX  (tr(AXrX))  =  d/dX  (tr(XrXA))  =  X(A+Ar) 
o  d/dX  (tr(XrAX))  =  d/dX  (tr(AXXr))  =  d/dX  (tr(XXrA))  =  (A+Ar)X 

•  d/dX  (tr(AXBX))  =  A7X7B7  +  BrXrAr 

• 

•  [O.symmetric]  d/dX  (tr((X7CX)_l  A)  =  d/dX  (tr(A  (X7CX)_I  )  =  -(CX(XrCX)'1)(A+Ar)(XrCX)'1 

•  [B,C -.symmetric]  d/dX  (tr((X7CX)_l(X7  BX))  =  d/dX  (tr(  (X7  BX)(X7CX)_I  )  =  ^(CX^CX)'1) 
X7BX(X7CX)_I  +  2BX(X/CX)"1 


Derivative  of  Determinant 


Note:  matrix  dimensions  must  result  in  an  n*n  argument  for  det().  Some  of  the  expressions  below 
involve  inverses:  these  forms  apply  only  if  the  quantity  being  inverted  is  square  and  non-singular. 

•  d/dX  (det(X))  =  d/dX  (det(Xr))  =  ADJ(A)r=det(X)*X‘r 

o  d/dX  (det(AXB))  =  A 7 A DJ/AXB ) B r  =  dct(AXB )*  A  7  (AX  B )yB T  =  det(AX B )*X' T 
o  d/dX  (ln(det(AXB)))  =  A7(AXBf7B7'  =  X'T 

•  d/dX  (det(X^))  =  £*det(xVx'r 

o  d/dX  (ln(det(X/t)))  =  kX'T 

•  [Real]  d/dX  (det(XrCX))  =  det(XrCX)*(C+Cr)X(XrCX)'1 

o  [C:  Real, Symmetric]  d/dX  (det(XrCX))  =  2dct(XyCX)*  CX(XrCX)_1 

•  [C:  Real,Symmetricc\  d/dX  (ln(dct(X7CX)))  =  2CX(XyCX)~l 


Jacobian 


rp 

If  y  is  a  function  of  x,  then  t/y  /dx  is  the  Jacobian  matrix  of  y  with  respect  to  x. 
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T 

Its  determinant,  \dy /dx|,  is  the  Jacobian  of  y  with  respect  to  x  and  represents  the  ratio  of  the  hyper¬ 
volumes  dy  and  dx.  The  Jacobian  occurs  when  changing  variables  in  an  integration:  Integral(f(y)dy) 
=Integral(f(y(x))  \dyT/dx\  dx). 

Hessian  matrix 

If  f  is  a  function  of  x  then  the  symmetric  matrix  d  f/dx"  =  d/dx  (df/dx)  is  the  Hessian  matrix  of  f(x).  A 
value  of  x  for  which  df/dx  =  0  corresponds  to  a  minimum,  maximum  or  saddle  point  according  to 
whether  the  Hessian  is  positive  definite,  negative  definite  or  indefinite. 

•  d2/dx2  (arx)  =  0 

•  d2/dx2  (Ax+b)rC(Dx+e)  =  ArCD  +  DrCrA 

o  d2/dx2  (xrCx)  =  C+CT 

■  d2/dx2  (xrx)  =  21 

o  d2/dx2  (Ax+b)7  (Dx+e)  =  A7D  +  D7A 

■  d2/dx2  (Ax+b)7"  (Ax+b)  =  2A7A 

o  [C:  symmetric]:  d2/dx2  (A\+b)7C(Ax+b)  =  2A7CA 


The  Matrix  Reference  Manual  is  written  by  Mike  Brookes.  Imperial  College,  London,  UK.  Please  send 
any  comments  or  suggestions  to  mike.brookes@ic.ac.uk 
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Appendix  C.  Source  Code 

The  following  listing  contains  the  source  code  for  the  MEX  implementation  of  the  ISD 
initialization  algorithm,  developed  in  Section  3.3.41  The  source  code  was  compiled 
using  lcc-win32,  which  is  included  with  the  student  version  of  MATLAB®  Release 
12. 

The  function  requires  four  input  arguments.  The  first  is  a  vector  of  length  Nh, 
which  contains  the  probability  weights  of  the  Nh  hypotheses.  The  second  is  an  N  x  Nh 
matrix,  the  columns  of  which  contain  the  mean  vectors  for  each  of  the  hypotheses. 
The  third  argument  is  a  three-dimensional  matrix  of  dimension  N  x  N  x  Nh,  which 
contains  the  covariance  matrices  for  each  of  the  Nh  hypotheses.  The  final  input  is  a 
scalar,  which  specifies  the  number  of  hypotheses  to  which  the  Nh  should  be  reduced. 

There  are  three  output  arguments  returned  by  the  function,  containing  the 
probability  weights,  mean  vectors  and  covariance  matrices  of  the  reduced  set  of 
hypotheses  in  the  same  format  as  the  input.  The  probability  weights  are  returned  in 
de-normalized  form,  such  that  they  will  not  necessarily  sum  to  unity;  normalization 
should  be  applied  as  a  later  step. 
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C.l  ISDInit.  c 


/*  ISDInit. c 

Integral  Square  Difference  Intialization  Algorithm 

This  MEX  function  performs  the  cost  function-based  mixture  reduction 
described  in  Section  3.3.4  of  the  thesis.  The  implementation  is 
highly  optimized  to  avoid  re-calculation  of  portions  of  the  cost 
function  which  do  not  change  when  the  reduction  steps  are  taken ,  and 
it  utilizes  the  efficient  multivariate  Gaussian  evaluation  described 
in  the  thesis. 

The  Matlab  function  takes  four  inputs.  The  first  is  the  vector  which 
contains  the  probability  weights  for  the  numMix  hypotheses .  The 
second  is  a  numVar  x  numMix  matrix  whose  columns  contain  the  mean 
vectors  for  the  hypotheses .  The  third  is  a  three-dimensional  matrix 
of  dimensions  numVar  x  numVar  x  numMix,  which  contains  the 
covariance  matrices  for  the  hypotheses .  The  final  input  is  a  scalar 
number  (numNewMix)  which  indicates  the  number  of  mixture  components 
to  which  the  input  mixture  is  to  be  simplified . 

The  function  provides  three  outputs,  which  contain  the  probability 
weights,  mean  vectors  and  covariance  matrices  for  the  reduced  set  of 
hypotheses . 

(c)  07 Jan 03  Flight  Lieutenant  Jason  L.  Williams,  RAAF 
AFIT  GE-03M  */ 


/include  " mex.h " 
/include  "matrix . h" 
/include  <math.h> 
/include  <float.h> 


/*  If  debug  is  set  to  ' 1 ',  debugging  information  will  be  written  to 
Matlab  stdout  device  during  execution .  */ 

/define  DEBUG  0 

/*  mergePossNum  converts  two  component  numbers  to  the  one-dimensional 
index  corresponding  to  that  merge  possibility  */ 

/define  mergePossNum  (ml ,  m2 )  ((m2)— 1  +  (  (ml )  *  (2*numMix  —  (  (ml) +3)  )  >>1)  ) 

/*  Assert  function  definition  which  works  when  not  compiled  in  debug 
mode  —  if  the  specified  condition  is  not  true  then  the  function 
is  terminated  and  the  specified  error  message  is  written  to  the 
screen  */ 

/define  jlwAssert (cond, message)  {if  ( ! (cond) )  {mexErrMsgTxt (message) ;} } 


/*  Function  prototypes  */ 

void  calcCurCost (void) ; 

void  calcCostOptions (void) ; 

void  copyMixtureParameters (void) ; 

void  deleteMixture (int  mix) ; 

void  mergeMixtures  (int  mixl,  int  mix2); 

void  calcOrigCosts (void) ; 

void  calcOrigMergePoss (void) ; 

void  calcMergeParam (int  ml,  int  m2)  ; 

double  calcDist (double  pi,  double  *meanl,  double  *covl, 
double  p2,  double  *mean2,  double  *cov2) ; 
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/*  Global  Variables 

Implementation  makes  extensive  use  of  global  variables  to  speed 
execution  overhead  associated  with  passing  large  data  structures . 

Definitions  of  variables  are  as  follows : 

Inv2PI:  the  constant  (1/ (2*pi)  ) 

numMix:  the  number  of  mixture  components  in  the  original  mixture 
numNewMix :  the  number  of  mixture  components  to  which  the  mixture 
is  to  be  simplified 

numVar:  the  number  of  variables  —  i.e.  the  dimensionality  of 
the  space  in  which  the  multivariate  Gaussian  mixture 
components  reside 

numMergePoss :  the  number  of  possible  merge  actions  which  can  be 
taken  to  simplify  the  original  mixture  —  i.e.  the  number  of 
unigue  pairs  of  two  components  selected  from  the  original  mixture 
n umCurMix :  the  counter  which  tracks  the  number  of  mixture  components 
as  it  is  reduced  from  numMix  to  numNewMix 
mixMask :  an  array  of  flags  indicating  which  components  are  still  in 
the  reduced  mixture .  When  components  are  deleted,  this  flag  is  set 
to  zero  to  indicate  that  the  repective  component  should  no  longer 
be  counted  in  the  mixture . 

probs :  the  hypothesis  probabilities  of  the  original  mixture 
means:  the  mean  vectors  of  the  original  mixture 
co vs :  the  covariance  matrices  of  the  original  mixture 
newProbs :  the  hypothesis  probabilities  of  the  reduced  mixture 
newMeans :  the  mean  vectors  of  the  reduced  mixture 
newCovs :  the  covariance  matrices  of  the  reduced  mixture 
muD :  a  temporary  variable  used  to  store  the  difference  between  two 
mean  vectors 

P:  a  temporary  variable  used  to  store  the  sum  of  two  covariance 
ma tri ces 

Di :  the  inverse  of  the  diagonal  portion  of  the  U-D  factored 
covariance  matrix 

mergep:  temporary  variable  used  to  store  the  probability  of  the 
component  resulting  from  the  merging  of  two  hypotheses 
mergeMu :  temporary  variable  used  to  store  the  mean  vector  of  the 
component  resulting  from  the  merging  of  two  hypotheses 
mergeP:  temporary  variable  used  to  store  the  covariance  matrix  of 
the  component  resulting  from  merging  two  hypotheses 
self:  the  matrix  whose  (i,  j)  component  represents  the  similarity 
between  components  i  and  j  of  the  reduced  mixture 
cross:  the  matrix  whose  (i,  j)  component  represents  the  similarity 
between  component  i  of  the  original  mixture  and  component  j  of  the 
reduced  mixture 

sumSelf :  each  entry  of  the  sumSelf  array  contains  the  sum  of  the 
entries  of  the  reduced  mixture  self -likeness  matrix  (self)  which 
are  due  to  the  respective  mixture  —  i.e.,  the  sum  of  the  row  and 
column  highlighted  in  the  right-hand  diagram  of  Figure  3.  7  in  the 
thesis 

sumCross :  the  sum  of  each  of  the  columns  of  the  cross-likeness 
matrix 

newSelf :  matrix  containing  the  new  column/row  for  the  self  matrix 
for  each  merge  possiblitiy 

newCross :  matrix  containing  the  new  column  for  the  cross  matrix  for 
each  merge  possibility 

newSumSelf :  array  containing  the  sum  of  each  of  the  newSelf  columns . 
Entries  of  this  array  are  actually  the  sum  of  the  new  row/column 
which  would  replace  the  previous  row/column,  as  per  the 
description  of  sumSelf  above. 

newSumCross :  array  containing  the  sum  of  each  of  the  newCross  cols 
actMixl,  actMix2 :  variables  used  to  store  the  best  action  found  so 
far.  If  the  best  action  is  a  deletion,  then  actMixl  contains  the 
component  number  to  be  deleted  and  actMix2  is  0;  otherwise  actMixl 
and  actMix2  are  the  numbers  of  the  components  to  be  merged. 
actMergePoss :  contains  the  merge  possibility  index  corresponding 
to  merging  actMixl  and  actMix2 
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curCost :  contains  the  current  cost  —  i.e.,  the  cost  of  the 
reduction  steps  performed  already 
sumDist :  contains  the  sum  of  the  original  mixture  self  likeness 

matrix.  The  contents  of  this  matrix  do  not  change,  hence  this  term 
can  be  used  throughout  the  reduction  to  calculate  the  actual  cost. 

const  double  Inv2PI  =  1 . 591549430918954e— 001; 

int  numMix,  numNewMix,  numVar,  numMergePoss,  numCurMix; 

char  *mixMask; 

double  *probs,  *means,  *covs,  *newProbs,  *newMeans,  *newCovs; 

double  *muD,  *P,  *Di,  mergep,  *mergeMu,  *mergeP; 

double  *self,  *cross,  *sumSelf,  *sumCross; 

double  *newSelf,  *newCross,  *newSumSelf,  *newSumCross; 

int  actMixl,  actMix2,  actMergePoss ; 

double  curCost,  sumDist; 


/*  mexFunction 

This  is  the  root  function  which  is  called  by  Matlab 

nlhs  contains  the  number  of  output  arguments  and  plhs  is  the  pointer 
to  the  output  argument  array;  nrhs  contains  the  number  of  input 
arguments  and  prhs  is  the  pointer  to  the  input  argument  array  */ 
void  mexFunction (int  nlhs,  mxArray  *plhs [  ],  int  nrhs, 
const  mxArray  *prhs [  ]  ) 

{ 

const  mxArray  *mxProbs,  *mxMeans,  *mxCovs,  *mxNumNewMix;  /*  inputs  */ 
mxArray  *mxNewProbs,  *mxNewMeans,  *mxNewCovs;  /*  outputs  */ 

double  doubNumNewMix,  *outProbs,  *outMeans,  *outCovs; 
int  newDims[3],  numDims; 
const  int  *dims; 

/*  Get  inputs  and  verify  input  types  */ 

jlwAssert  (nrhs  ==  4,  "Four  inputs  reguired ")  ; 

mxProbs  =  prhs[0];  mxMeans  =  prhs[l];  mxCovs  =  prhs  [2]; 

mxNumNewMix  =  prhs [3] ; 

jlwAssert (mxGetClassID (mxProbs)  ==  mxDOUBLE_CLASS  && 

! mxIsComplex (mxProbs ) , "Inputs  must  be  real  doubles")  ; 
jlwAssert (mxGetClassID (mxMeans)  ==  mxDOUBLE_CLASS  && 

! mxIsComplex (mxMeans ) , "Inputs  must  be  real  doubles")  ; 
jlwAssert (mxGetClassID (mxCovs)  ==  mxDOUBLE_CLASS  && 

! mxIsComplex (mxCovs )  ,  " Inputs  must  be  real  doubles")  ; 
jlwAssert (mxGetClassID (mxNumNewMix)  ==  mxDOUBLE_CLASS  && 

! mxIsComplex (mxNumNewMix) ,  "Inputs  must  be  real  doubles")  ; 

/*  Check  that  dimensionality  of  inputs  is  consistent  */ 
numDims  =  mxGetNumberOfDimensions (mxProbs) ; 
if  (numDims  ==  1)  { 

dims  =  mxGetDimensions (mxProbs) ; 
numMix  =  dims[0]; 

}  else  if  (numDims  ==  2)  { 

dims  =  mxGetDimensions (mxProbs) ; 
if  (dims [0]  ==  1) 
numMix  =  dims [ 1 ] ; 

else 

numMix  =  dims [0] ; 

}  else  { 

mexErrMsgTxt  (" Invalid  probability  array.  ")  ; 

} 
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/*  Check  dimensionality  of  means  */ 

jlwAssert (mxGetNumberOfDimensions (mxMeans)  ==  2, 

"means  should  be  2 -dimensional")  ; 
dims  =  mxGetDimensions (mxMeans) ; 
numVar  =  dims[0]; 
jlwAssert (dims [ 1 ]  ==  numMix, 

"Size  of  means  inconsistent  with  size  of  probs")  ; 

/*  Check  dimensionality  of  covariances  */ 
jlwAssert (mxGetNumberOfDimensions (mxCovs)  ==  3, 

" covs  should  be  3 -dimensional")  ; 
dims  =  mxGetDimensions (mxCovs) ; 
jlwAssert (dims [ 0 ]  ==  numVar, 

"Size  of  covs  inconsistent  with  size  of  probs  and  means")  ; 
jlwAssert (dims [ 1 ]  ==  numVar, 

"Size  of  covs  inconsistent  with  size  of  probs  and  means")  ; 
jlwAssert (dims [2 ]  ==  numMix, 

"Size  of  covs  inconsistent  with  size  of  probs  and  means")  ; 

/*  Check  for  valid  number  of  mixture  components  */ 
jlwAssert (mxGetNumberOfElements (mxNumNewMix)  ==  1, 

"Number  of  components  should  be  scalar")  ; 
doubNumNewMix  =  mxGetScalar (mxNumNewMix) ; 
numNewMix  =  (int)  doubNumNewMix; 

jlwAssert ( ( (double)  numNewMix)  ==  doubNumNewMix, 

"Number  of  components  should  be  integer")  ; 
jlwAssert (numNewMix  >  0, 

"Number  of  components  should  be  positive ”)  ; 
jlwAssert (numNewMix  <  numMix, 

"Number  of  output  components  should  be  less  than  input  number")  ; 

/*  Get  pointers  to  the  real  data  arrays  of  the  inputs  */ 
probs  =  mxGetPr (mxProbs ) ; 
means  =  mxGetPr (mxMeans ) ; 
covs  =  mxGetPr (mxCovs ) ; 

/*  Create  output  data  structures  */ 
jlwAssert  (nlhs  ==  3,  "Three  outputs  required .  ")  ; 

plhs[0]  =  mxNewProbs  =  mxCreateDoubleMatrix ( 1 , numNewMix, mxREAL) ; 
plhs[l]  =  mxNewMeans  =  mxCreateDoubleMatrix (numVar, numNewMix, mxREAL) ; 
newDims[0]  =  numVar;  newDims[l]  =  numVar;  newDims[2]  =  numNewMix; 
plhs[2]  =  mxNewCovs  =  mxCreateNumericArray (3, newDims, mxDOUBLE_CLASS, 
mxREAL) ; 

jlwAssert (mxNewProbs  !=  NULL  &&  mxNewMeans  !=  NULL  && 
mxNewCovs  !=  NULL,  "Memory  allocation  failure")  ; 
outProbs  =  mxGetPr (mxNewProbs ) ; 
outMeans  =  mxGetPr (mxNewMeans) ; 
outCovs  =  mxGetPr (mxNewCovs ) ; 

/*  Allocate  memory  for  temporary  variables  */ 
muD  =  (double  *)  mxMalloc (sizeof (double) *numVar) ; 

P  =  (double  *)  mxMalloc (sizeof (double) * numVar* numVar) ; 

Di  =  (double  *)  mxMalloc (sizeof (double) *numVar) ; 
jlwAssert  (muD  !=  NULL  &&  P  !=  NULL  &&  Di  !=  NULL, 

"Memory  allocation  failure")  ; 

/*  Allocate  memory  for  temporary  variables  for  merging  components  */ 
mergeMu  =  (double  *)  mxMalloc (sizeof (double) *numVar) ; 
mergeP  =  (double  *)  mxMalloc (sizeof (double) * numVar* numVar) ; 
jlwAssert  (mergeMu  !=  NULL  &&  mergeP  !=  NULL, 

"Memory  allocation  failure")  ; 
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/*  Allocate  memory  for  new  mixture  parameters  */ 

newProbs  =  (double  *)  mxMalloc (sizeof (double) *numMix) ; 

newMeans  =  (double  *)  mxMalloc (sizeof (double) *numMix*numVar) ; 

newCovs  =  (double  *)  mxMalloc (sizeof (double) *numMix*numVar*numVar) ; 

mixMask  =  (char  *)  mxMalloc (sizeof (char) *numMix) ; 

jlwAssert (newProbs  !=  NULL  &&  newMeans  !=  NULL  && 

newCovs  !=  NULL  &&  mixMask  !=  NULL,  "Memory  allocation  failure")  ; 

/*  Allocate  memory  for  distance  matrices  */ 
self  =  (double  *)  mxMalloc (sizeof (double) *numMix*numMix) ; 
cross  =  (double  *)  mxMalloc (sizeof (double) *numMix*numMix) ; 
sumSelf  =  (double  *)  mxMalloc (sizeof (double) *numMix) ; 
sumCross  =  (double  *)  mxMalloc (sizeof (double) *numMix) ; 
jlwAssert  (self  !=  NULL  &&  cross  !=  NULL  && 

sumSelf  !=  NULL  &&  sumCross  !=  NULL , "Memory  allocation  failure")} 

/*  Allocate  memory  for  merge  possibilities  */ 
numMergePoss  =  (numMix*  (numMix—  1)  )  >>  1; 

newSelf  =  (double  *)  mxMalloc (sizeof (double) *numMergePoss* numMix) ; 
newCross  =  (double  *)  mxMalloc (sizeof (double) *numMergePoss*numMix) ; 
newSumSelf  =  (double  *)  mxMalloc (sizeof (double) *numMergePoss ) ; 
newSumCross  =  (double  *)  mxMalloc (sizeof (double) *numMergePoss ) ; 
jlwAssert (newSelf  !=  NULL  &&  newCross  !=  NULL  && 
newSumSelf  !=  NULL  &&  newSumCross  !=  NULL, 

"Memory  allocation  failure ")  ; 

/*  Set  up  structures  */ 
copyMixtureParameters () ; 
calcOrigCosts () ; 
calcOrigMergePoss ()  ; 

/*  Reduce  mixtures  —  this  is  the  main  loop  for  the  reduction  */ 

for  (numCurMix  =  numMix;  numCurMix  >  numNewMix;  numCurMix - )  { 

calculate  the  current  cost  —  the  cost  of  the  reduction  steps 
already  taken  */ 
calcCurCost  ( ) ; 

/*  calculate  the  cost  of  each  of  the  merge  and  deletion  options  */ 
calcCostOptions () ; 

/*  take  the  lowest  cost  option  */ 
if  (actMix2  ==  0)  { 

Lowest  cost  option  was  to  delete  a  component  actMixl  */ 
deleteMixture (actMixl) ; 

}  else  { 

/*  Lowest  cost  option  was  to  merge  components  actMixl  and  actMix2 
Hence  we  remove  actMix2  from  the  mixture  and  replace  actMixl 
with  the  parameters  for  the  merged  components  */ 
deleteMixture (actMix2) ; 
mergeMixtures (actMixl, actMix2) ; 


/*  Store  results  In  Matlab  output  structure  */ 

{ 

int  mi,  mo,  i,  j,  k; 
mo  =  0; 

for  (mi  =  0;  mi  <  numMix;  mi++)  { 

if  (mixMask [mi ] )  { 

for  (i  =  0;  i  <  numVar;  i++)  { 

outMeans [mo*numVar  +  i]  =  newMeans [mi*numVar  +  i]  ; 


for  (j  =  i;  j  <  numVar;  j++)  { 


outCovs [mo*numVar*numVar  + 

i 

+ 

j 

*numVar]  = 

outCovs 

[mo* numVar* numVar 

+ 

j 

+ 

i*numVar ] 

newCovs 

[mi* numVar* numVar 

+ 

i 

+ 

j*numVar]  \ 

outProbs [mo]  =  newProbs [mi ] ; 
mo++; 


/*  Deallocate  memory  */ 

mxFree (newSumCross) ;mxFree (newSumSelf ) ;mxFree (newCross)  ; 
mxFree (newSelf ) ;mxFree (sumCross) ;mxFree (sumSelf ) ;mxFree (cross) ; 
mxFree (self) ;mxFree (mixMask) ;mxFree (newCovs) ;mxFree (newMeans) ; 
mxFree (newProbs) ; mxFree (mergeP) ; mxFree (mergeMu) ; mxFree (Di) ; mxFree (P) 
mxFree (muD) ; 


/*  calcCurCost  —  Calculates  the  current  cost  —  i.e.  the  cost  of  the 
reduction  steps  already  chosen. 

Precondition :  sumDist ,  mixMask,  sumSelf  and  sumCross  structures 
populated  and  up  to  date 

Postcondition :  curCost  will  contain  the  cost  of  the  current  reduced 
PDF  representation .  */ 

void  calcCurCost (void) 

{ 

register  int  i; 

/*  Commence  with  the  cost  due  to  the  original  mixture 
self -likeness  */ 
curCost  =  sumDist; 

/*  Add  the  cost  components  due  to  each  mixture  component  in  the 
cross-likeness  and  reduced  self -likeness  matrices  */ 
for  (i  =  0;  i  <  numMix;  i++)  { 
if  (mixMask [i] )  { 

curCost  +=  0 . 5* (sumSelf [ i ]  +  self [i* (numMix+1) ] )  —  2*sumCross [i] ; 

} 

} 

} 
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/*  calcCost  Opt  ions  —  Calculates  the  cost  of  all  options  for  deleting 
or  merging  mixture  components 
Precondition :  mixMask,  curCost ,  sumCross,  sumSelf,  self, 

newSumCross,  numSumSelf  populated  and  up  to  date 
Postcondition :  actMixl ,  actMix2  and  actMergePoss  contain  values 

indicating  the  lowest  cost  action.  If  actMix2  is  zero 
then  the  lowest  cost  action  was  to  delete  component 
actMixl .  Otherwise,  the  lowest  cost  action  was  to 
merge  actMixl  and  actMix2 ,  which  corresponds  to  merge 
possibility  number  actMergePoss .  */ 
void  calcCostOptions (void) 

{ 

register  int  i,  j ,  mergePoss; 

register  double  minCost  =  DBL_MAX,  costOpt; 

for  (i  =  0;  i  <  numMix;  i++)  { 
if  (mixMask [i] )  { 

/*  Calculate  cost  for  deleting  mixture  */ 
costOpt  =  curCost  +  2*sumCross [i]  —  sumSelf [i]; 

if  (costOpt  <  minCost)  { 
minCost  =  costOpt; 
actMixl  =  i;  actMix2  =  0; 


for  (j  =  i+1;  j  <  numMix;  j++)  { 

if  (mixMask [ j ] )  { 

mergePoss  =  mergePossNum (i, j ) ; 

/*  Calculate  cost  for  merging  mixtures  */ 
costOpt  =  curCost  +  2*sumCross  [ i ]  +  2*sumCross  [  j ]  + 

— sumSelf [i]  —  sumSelf [j]  +  2*self [i+ j*numMix]  + 

— 2*newSumCross [mergePoss ]  +  newSumSelf [mergePoss ] ; 

if  (costOpt  <  minCost)  { 
minCost  =  costOpt; 
actMixl  =  i;  actMix2  =  j; 
actMergePoss  =  mergePoss; 


/*  Print  debugging  information  to  screen  if  flag  is  true  */ 
if  (DEBUG)  { 

if  (actMix2  ==  0)  { 

mexPrint f  ( "Current  cost  %g;  Deleting  mixture  %d  for  cost  %g\n" , 
curCost , actMixl , minCost ) ; 

}  else  { 

mexPrint f  ( "Current  cost  %g;  Merging  mix  %d  and  %d  for  cost  %g\n" , 
curCost , actMixl , actMix2 , minCost ) ; 
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/*  copyMixtureParameters  —  Copies  probabilities,  means  and  covariances 
from  original  structures  into  new  working  structures 
to  provide  the  starting  point  for  the  reduction 
process . 

Precondition :  probs,  means  and  covs  contain  the  parameters  for  the 
original  mixtures,  memory  is  allocated  for  newProbs, 
newMeans  and  newCovs 

Postcondition :  Data  from  probs,  means  and  covs  are  copied  into 
newProbs,  newMeans  and  newCovs .  */ 
void  copyMixtureParameters (void) 

{ 

register  int  i; 

int  numElem; 

/*  Copy  probabilities  */ 
numElem  =  numMix; 
for  (i  =  0;  i  <  numElem;  i++) 
newProbs  [i]  =  probs  [i] ; 

/*  Copy  means  */ 
numElem  *=  numVar; 
for  (i  =  0;  i  <  numElem;  i++) 
newMeans  [i]  =  means  [i] ; 

/*  Copy  covariances  */ 
numElem  *=  numVar; 
for  (i  =  0;  i  <  numElem;  i++) 
newCovs [i]  =  covs[i]; 

/*  Initialize  the  current  number  of  mixture  components  */ 
numCurMix  =  numMix; 


/*  deleteMixture  —  Deletes  the  specified  component,  updates  all  costs 
Precondition :  mix  contains  the  index  of  the  mixture  to  be  deleted 
Postcondition :  newSumSelf  (self -likeness  entries  for  each  merge 
possibility)  and  sumSelf  (partial  sums  of  self- 
likeness  entries  for  current  reduced  mixture)  are 
updated  to  reflect  the  new  cost  after  the  specified 
component  has  been  deleted  */ 
void  deleteMixture (int  mix) 

{ 

register  int  ml,  m2,  mergePoss; 

/*  Clear  the  flag  for  the  mixture  to  indicate  that  it  has  been 
deleted  */ 
mixMask[mix]  =0; 

/*  Update  stored  new  columns  for  the  cross-likeness  and  self -likeness 
matrices  for  all  merge  possibilities  */ 
for  (ml  =  0;  ml  <  numMix;  ml++)  { 
if  (mixMask [ml ] )  { 

for  (m2  =  ml  +  1;  m2  <  numMix;  m2++)  { 
if  (mixMask [m2 ] )  { 

mergePoss  =  mergePossNum (ml , m2 ) ; 

newSumSelf [mergePoss ]  — =  2*newSelf [mergePoss*numMix+mix] ; 
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/*  Update  partial  sums  of  the  self  likeness  matrix  to  reflect  removal 
of  component  */ 

for  (ml  =  0;  ml  <  numMix;  ml++)  { 
if  (mixMask [ml ] )  { 

Subtract  self  distances  due  to  deleted  component  */ 
sumSelf[ml]  — =  2*self [mix*numMix+ml ] ; 


/*  mergeMixtures  —  Updates  all  merge  possibilities  with  the  newly 
merged  component ,  placing  the  parameters  for  merged 
componentes  in  mixl 

Precondition :  mixl  and  mix2  contain  the  indices  of  the  two 
components  to  be  merged.  mix2  should  have  been 
deleted  already  (using  deleteMixture  ( )  ) 

Postcondition :  parameters  of  merged  components  are  calculated  and 
stored  in  place  of  mixl;  cross  and  self  matrix 
entries  (and  sum  vector  entries )  are  updated  with  new 
costs;  merge  possibility  cost  structures  are  updated 
to  reflect  the  changes  due  to  the  merged  components .  */ 
void  mergeMixtures (int  mixl,  int  mix2) 

{ 

int  ml,  m2,  m3,  i,  j,  k,  mergePoss; 

double  d; 

/*  Calculate  the  parametes  (weight,  mean,  covariance )  for  the 
merged  components  */ 
calcMergeParam (mixl , mix2 )  ; 

/*  Store  parameters  for  newly  merged  component  in  place  of  mixl  */ 
for  (i  =  0;  i  <  numVar;  i++)  { 
for  (j  =  i;  j  <  numVar;  j++)  { 
k  =  i  +  j*numVar; 

newCovs [mixl*numVar*numVar+k]  =  mergeP [k] ; 

} 


newMeans [mixl *numVar+i ]  =  mergeMu[i]; 

} 

newProbs [mixl ]  =  mergep; 

/*  Update  distance  matrices  to  reflect  merge 

(using  the  pre-computed  parameters  from  the  merge  possibility 
structure)  */ 

mergePoss  =  mergePossNum (mixl , mix2 ) ; 
for  (ml  =  0;  ml  <  numMix;  ml++)  { 

/*  Store  cross  distances  for  new  component  */ 

cross [ml+mixl*numMix]  =  newCross [mergePoss*numMix+ml ] ; 

if  (mixMask [ml ] )  { 

/*  Store  self  distances  for  new  component  &  update  sums  */ 
sumSelf[ml]  — =  2*self [mixl+ml*numMix]  ; 
d  =  self [mixl+ml*numMix]  =  self [ml+mixl*numMix]  = 
newSelf [mergePoss*numMix+ml ] ; 
sumSelf[ml]  +=  2*d; 


sumCross [mixl ]  =  newSumCross [mergePoss ] ; 
sumSelf [mixl ]  =  newSumSelf [mergePoss ] ; 
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/*  Update  distances  for  all  merge  possibilities  */ 
for  (ml  =  0;  ml  <  numMix;  ml++)  { 
if  (mixMask [ml ] )  { 

for  (m2  =  ml  +  1;  m2  <  numMix;  m2++)  { 
if  (mixMask [m2 ] )  { 

mergePoss  =  mergePossNum (ml , m2 ) ; 
calcMergeParam (ml , m2 ) ; 

/*  If  the  merge  possibility  involves  the  modified  component 
then  everything  is  changed  and  has  to  be  recalculated  */ 
if  (ml  ==  mixl  |  |  m2  ==  mixl)  { 
newSumCross [mergePoss ]  =  0; 
newSumSelf [mergePoss ]  =0; 

for  (m3  =0;  m3  <  numMix;  m3++)  { 

/*  Calculate  distance  of  new  merge  possibility  to 
ori  ginal  comp  on  en  t  s  */ 
d  =  calcDist (mergep, mergeMu, mergeP , 

probs [m3] , Smeans [m3*numVar] ,  &covs [m3*numVar*numVar] ) ; 
newCross [mergePoss*numMix+m3 ]  =  d; 
newSumCross [mergePoss ]  +=  d; 

if  (mixMask [m3] )  { 

if  (m3  ==  ml)  { 

/*  The  merged  component  will  be  replaced  by  ml  —  so 
this  is  the  new  self -likeness  entry  for  the 
comp  on  en  t  */ 

d  =  calcDist (mergep, mergeMu, mergeP, mergep, mergeMu, 
mergeP ) ; 

newSelf [mergePoss*numMix+m3 ]  =  d; 
newSumSelf [mergePoss ]  +=  d; 

}  else  if  (m3  ==  m2)  { 

/*  Under  the  possibility  being  considered  m2  would  be 
deleted  */ 

newSelf [mergePoss*numMix+m3 ]  =0; 

}  else  { 

/*  Calculate  self  entry  &  store  */ 
d  =  calcDist (mergep, mergeMu, mergeP, 
newProbs [m3] , &newMeans [m3*numVar]  , 

SnewCovs [m3*numVar*numVar] ) ; 
newSelf [mergePoss*numMix+m3 ]  =  d; 
newSumSelf [mergePoss ]  +=  2*d; 


}  else  {  /*  if  (ml  ==  mixl  \  \  m2  ==  mix2)  */ 

/*  If  the  merge  possibility  does  not  involve  the  modified 
component  then  we  just  need  to  update  the  appropriate 
new  self -likeness  term  */ 

d  =  calcDist (mergep, mergeMu, mergeP, 

newProbs [mixl] , &newMeans [mixl*numVar ] , 

SnewCovs [mixl*numVar*numVar] ) ; 
newSumSelf [mergePoss ]  +=  2*d  — 

2* newSelf [mergePoss *numMix+mixl ] ; 
newSelf [mergePoss*numMix+mixl ]  =  d; 
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/*  calcOrigCosts  —  Populate  the  original  cost  matrix 

Precondition :  memory  should  be  allocated  for  all  structures ;  probs, 
means  and  covs  should  contain  parameters  for  original 
mixture  components 

Postcondition:  cross  and  self  matrices  are  populated ',  partial  sums 
are  calculated ',  sumDist  is  calculated,  mixture  mask 
flags  are  initialized  */ 
void  calcOrigCosts (void) 

{ 

int  ml,  m2,  i,  j; 

/*  Zero  out  the  partial  sums  */ 
for  (ml  =  0;  ml  <  numMix;  ml++) 
sumCross [ml]  =  0.0; 

/*  Calculate  similarity  measure  for  every  pair  of  components  */ 
for  (ml  =0;  ml  <  numMix;  ml++)  { 
for  (m2  =  ml;  m2  <  numMix;  m2++)  { 

i  =  ml *numMix+m2 ;  j  =  m2 *numMix+ml ; 
cross[i]  =  cross[j]  =  self[i]  =  self[j]  = 

calcDist (probs [ml] , &means [ml*numVar] , &covs [ml*numVar*numVar] , 
probs [m2] , &means [m2*numVar] , &covs [m2*numVar*numVar] ) ; 

/*  Update  the  partial  sums  for  the  two  components  */ 
sumCross [ml]  +=  cross [i] ; 
if  (ml  !=  m2) 

sumCross [m2]  +=  cross  [i] ; 


sumDist  =  0; 

for  (ml  =  0;  ml  <  numMix;  ml++)  { 

/*  Calculate  partial  self  sum  from  cross  sum  (this  contains  the 
sum  of  the  matrix  row  and  column  due  to  the  respective 
comp  on ent)  */ 

sumSelf[ml]  =  2*sumCross [ml ]  —  cross [ml* (numMix+1 )] ; 

/*  Calculate  total  sum  for  original  mixture  */ 
sumDist  +=  sumCross [ml ] ; 

/*  Initialize  mask  flags  */ 
mixMask[ml]  =  1; 
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/*  calcOrigMergePoss  —  Calculate  all  merge  possibilities  for  original 
mixture 

Precondition :  memory  is  allocated  for  structures ,  distance  matrices 
(self  and  cross)  and  partial  sums  are  populated; 
probs,  means  and  covs  contain  parameters  of  original 
mixture 

Postcondition :  newSelf,  newCross,  newSumSelf  and  numSumCross  are 

populated  to  reflect  the  new  entries  for  the  self  and 
cross  matrices  if  each  pair  of  components  are  selected 
for  merging  */ 
void  calcOrigMergePoss (void) 

{ 

int  ml,  m2,  m3,  mergePoss; 

double  d; 

for  (ml  =0;  ml  <  numMix;  ml++)  { 

for  (m2  =  ml+1;  m2  <  numMix;  m2++)  { 
mergePoss  =  mergePossNum (ml , m2 ) ; 
newSumCross [mergePoss ]  =  0; 

/*  Calculate  parameters  for  merging  components  ml  &  m2  */ 
calcMergeParam  (ml ,  m2 )  ; 

/*  Calculate  distance  of  this  merged  component  to  all  other 
components  */ 

for  (m3  =  0;  m3  <  numMix;  m3++)  { 
d  =  calcDist (mergep, mergeMu, mergeP , 

probs [m3] , &means [m3*numVar] , &covs [m3*numVar*numVar] ) ; 
newSelf [mergePoss*numMix+m3]  = 

newCross [mergePoss*numMix+m3 ]  =  d; 
newSumCross [mergePoss ]  +=  d; 

} 


Calculate  self  distance  for  component  */ 
d  =  calcDist (mergep, mergeMu, mergeP , mergep, mergeMu, mergeP ) ; 
newSelf [mergePoss*numMix+ml ]  =  d; 
newSelf [mergePoss*numMix+m2 ]  =  0; 

newSumSelf [mergePoss ]  =  2* (newSumCross [mergePoss ]  — 
newCross [mergePoss*numMix+ml]  — 
newCross [mergePoss*numMix+m2 ] )  +  d; 
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/*  calcMergeParam  —  Calculates  the  parameters  (mean,  cov,  prob)  for 
merging  a  pair  of  components ,  puts  them  in  the  global 
holding  area  mergep,  mergeMu,  mergeP 
Precondition :  newProbs,  newMeans  and  newCovs  contain  the  current 
parameters  of  the  reduced  mixture;  ml  and  m2  contain 
the  indices  of  the  components  to  be  merged 
Postcondition :  mergep ,  mergeMu  and  mergeP  contain  the  weight ,  mean 
and  covariance  for  the  component  fitted  to  the  pair  of 
components ,  with  the  parameters  such  that  the  overall 
mean  and  covariance  remains  unchanged .  muD  is  used  for 
t emporary  cal cul ati on . 

Note:  only  lower  triangle  of  matrix  is  calculated;  upper 
triangle  is  neither  calculated  nor  populated  */ 
void  calcMergeParam (int  ml,  int  m2) 

{ 

register  int  i,  j,  k; 
register  double  pi,  p2; 

double  di,  *meanl  =  &newMeans [numVar*ml] , 

*mean2  =  SnewMeans [numVar*m2 ] , 

*covl  =  SnewCovs [numVar*numVar*ml ] , 

*cov2  =  &newCovs [numVar*numVar*m2] ; 

pi  =  newProbs [ml ] ;  p2  =  newProbs  [m2 ] ; 
mergep  =  pi  +  p2; 
di  =  1.0 /mergep; 
pi  *=  di;  p2  *=  di; 

/*  Calculate  difference  of  means  and  combined  mean  */ 
for  (i  =  0;  i  <  numVar;  i++)  { 
muD[i]  =  meanl[i]  —  mean2[i]; 
mergeMu [i]  =  pl*meanl[i]  +  p2*mean2[i]; 

} 


/*  Calculate  combined  covariance  */ 
for  (i  =  0;  i  <  numVar;  i++)  { 
for  (j  =  i;  j  <  numVar;  j++)  { 
k  =  i  +  j*numVar; 

mergeP [k]  =  pl*covl [k]  +  p2*cov2 [k]  +  pl*p2*muD [i] *muD  [  j ] ; 


/*  calcDist  —  Calculate  a  single  distance  entry  between  the  given 
parameters .  This  is  the  '  'engine'  ' ,  containing  the 
highly  optimized  implementation  of  Eg.  (3.  46) 
described  in  Section  3. 3.  4.1. 

Precondition :  pi,  meanl  and  covl,  and  p2,  mean2  and  cov2  contain  the 
parameters  of  the  pair  of  components  to  be  merged 
Postcondition :  the  similarity  measure  between  the  two  components  is 
calculated  and  returned .  The  temporary  structures  muD 
and  P  are  used  for  the  calculation .  */ 
double  calcDist (double  pi,  double  *meanl,  double  *covl, 
double  p2,  double  *mean2,  double  *cov2) 

{ 

register  int  i,  j,  k; 
register  double  d,  di; 
double  diProd,  cost; 
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/*  Calculate  the  sum  of  the  two  covariances  and  the  difference 

of  the  two  means  (only  calculate  lower  triangle  of  the  covariance 
sum)  */ 

for  (i  =  0;  i  <  numVar;  i++)  { 
for  (j  =  i;  j  <  numVar;  j++)  { 
k  =  i  +  j*numVar; 

P[k]  =  covl [k]  +  cov2 [k] ; 


muD[i]  =  meanl[i]  —  mean2 [i] ; 

} 


Divide  right-most  column  by  lower-right  element  */ 
di  =  1 . 0/P [numVar *numVar  —  1]; 

Di [numVar— 1]  =  di; 

diProd  =  di*Inv2PI; 

for  (j  =  0;  j  <  numVar— 1;  j++) 

P  [  j  +  numVar*  (numVar— 1 )  ]  *=  di; 

/*  Complete  U-D  factorization  in-place  */ 

for  (j  =  numVar— 2;  j  >=  0;  j - )  { 

/*  Calculate  diagonal  element  for  column  */ 
d  =  P [ j * ( numVar +1 ) ] ; 
for  (k  =  j+1;  k  <  numVar;  k++)  { 
di  =  P [ j+k*numVar ] ; 
d  — =  P [k* (numVar+1 ) ] *di*di; 

} 

P [ j * (numVar+1 ) ]  =  d; 
di  =  1.0/d; 

Di [ j ]  =  di; 
diProd  *=  di*Inv2PI; 

/*  Calculate  rest  of  column  */ 

for  (i  =  j  — 1;  i  >=  0;  i - )  { 

d  =  P [i+ j*numVar] ; 

for  (k  =  j+1;  k  <  numVar;  k++) 

d  — =  P [k* (numVar+1) ] *P [i+k*numVar] *P [ j+k*numVar] ; 

P [ i+ j *numVar ]  =  d*di; 


if  (meanl  ==  mean2)  { 

/*  Calculate  self  cost  if  the  two  components  were  the  same  */ 
return  pl*p2*sqrt (diProd) ; 

}  else  { 

/*  Solve  back-substitution  with  mean  */ 
di  =  0; 

for  (j  =  numVar— 1;  j  >=  0;  j - )  { 

d  =  muD [ j ] ; 

for  (i  =  j+1;  i  <  numVar;  i++) 
d  — =  muD [i] *P [ j+i*numVar ] ; 

muD  [  j ]  =  d; 
di  +=  d*d*Di [ j ] ; 

} 


Calculate  cost  &  return  */ 
return  pl*p2*exp  (  —  0 . 5*di)  *sqrt  (diProd)  ; 


C-15 


Bibliography 


1.  Alspach,  D.L.  “A  Gaussian  Sum  Approach  to  the  Multitarget  Identification- 
Tracking  Problem,”  Automatica ,  11  (3):285-296  (May  1975). 

2.  Bar-Shalom,  Yaakov  and  Thomas  E.  Fortmann.  Tracking  and  Data  Association. 
Orlando,  FL:  Academic  Press,  Inc.,  1988. 

3.  Bar-Shalom,  Yaakov  and  Xiao-Rong  Li.  Estimation  and  Tracking:  Principles, 
Techniques  and  Software.  Norwood,  MA:  Artech  House,  1993. 

4.  Bar-Shalom,  Yaakov  and  Xiao-Rong  Li.  Multitarget- Multisensor  Tracking:  Prin¬ 
ciples  and  Techniques.  Storrs,  CT:  YBS  Publishing,  1995. 

5.  Billctter,  Dale  R.  Multifunction  Array  Radar.  Norwood,  MA:  Artech  House, 
1989. 

6.  Blackman,  Samuel  S.  Multiple-Target  Tracking  with  Radar  Applications.  Nor¬ 
wood,  MA:  Artech  House,  1986. 

7.  Blackman,  Samuel  S.  and  Robert  Popoli.  Design  and  Analysis  of  Modern  Track¬ 
ing  Systems.  Norwood,  MA:  Artech  House,  1999. 

8.  Blackman,  S.S.,  et  ah  “Application  of  Multiple  Hypothesis  Tracking  to  Mul¬ 
tiradar  Air  Defense  Systems,”  Multisensor  Multitarget  Data  Fusion,  Tracking 
and  Identification  Techniques  for  Guidance  and  Control  Applications ,  NATO 
AGARD  AG-337: 96-120  (October  1996). 

9.  Bloern,  Edwin  A.  and  Henk  A.P.  Blom.  “Joint  Probabilistic  Data  Association 
Methods  Avoiding  Track  Coalescence,”  Proceedings  of  the  3fth  IEEE  Conference 
on  Decision  and  Control ,  3: 2752-2757  (December  1995). 

10.  Blom,  Henk  A.P.  and  Yaakov  Bar-Shalom.  “The  Interacting  Multiple  Model  Al¬ 
gorithm  for  Systems  with  Markovian  Switching  Coefficients,”  IEEE  Transactions 
on  Automatic  Control ,  33(8):780-783  (August  1988). 

11.  Blom,  Henk  A.P.  and  Edwin  A.  Bloem.  “Joint  Probabilistic  Data  Association 
Avoiding  Track  Coalescence,”  IEE  Colloquium  on  Algorithms  for  Target  Track¬ 
ing ,  1  / 1—1/3  (May  1995). 

12.  Blom,  Henk  A.P.  and  Edwin  A.  Bloem.  “Probabilistic  Data  Association  Avoiding 
Track  Coalescence,”  IEEE  Transactions  on  Automatic  Control ,  45 ( 2):247-259 
(February  2000). 

13.  Brooks,  Mike.  “Matrix  Reference  Manual:  Matrix  Calculus.”  Online  reference 
material,  n.  pag.  http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html. 
04  December  2002.  Reproduced  in  Appendix  B. 


BIB-1 


14.  Burns,  Brendan  T.,  James  B.  Moody  and  Jason  L.  Williams.  Who  Ami?  Per¬ 
son  Verification  System.  BE  (Electronics)  Undergraduate  Project,  Queensland 
University  of  Technology,  Brisbane,  Australia,  1998. 

15.  Busch,  M.  and  S.  Blackman.  “Evaluation  of  IMM  Filtering  for  an  Air  De¬ 
fense  System  Application,”  SPIE  Signal  and  Data  Processing  of  Small  Targets , 
2561 :435-447  (July  1995). 

16.  Cong,  Shan.  Statistical  Studies  in  Multiple  Target  Tracking.  M.S.  Eng  Thesis, 
Wright  State  University,  Dayton,  OH,  1996. 

17.  Dempster,  R.J.,  et  al.  “Combining  IMM  Filtering  and  MHT  Data  Association 
for  Multitarget  Tracking.”  Proceedings  of  the  29th  Southeastern  Symposium  on 
System  Theory.  123-127.  Cookeville,  TN:  IEEE  Press,  March  1997. 

18.  Dennis,  John  E.  and  Robert  B.  Schnabel.  Numerical  Methods  for  Unconstrained 
Optimization  and  Nonlinear  Equations.  Englewood  Cliffs,  NJ:  Prentice-Hall, 
1983. 

19.  Ding,  Z.  and  L.  Hong.  “Bias  Phenomenon  and  Compensation  for  PDA/JPDA 
Algorithms,”  Mathematical  Computer  Modelling,  27 ( 12) :  116  (June  1998). 

20.  Dwyer,  Paul  S.  “Some  Application  of  Matrix  Derivatives  in  Multivariate  Anal¬ 
ysis,”  American  Statistical  Association  Journal,  607-625  (June  1967). 

21.  Hong,  Lang.  “EE718  Multitarget  Tracking  and  Data  Association.”  Class  As¬ 
signment.  Wright  State  University,  Dayton,  OH,  2002. 

22.  Hong,  Lang  and  Shan  Cong.  “Bias  Phenomenon  and  Compensation  in  Multiple 
Target  Tracking  Algorithms,”  Mathematical  Computer  Modelling,  31  (8—9) :  147— 
165  (May  2000). 

23.  Kailath,  Thomas.  “The  Divergence  and  Bhattacharyya  Distance  Measures 
in  Signal  Selection,”  IEEE  Transactions  on  Communication  Theory,  COM- 
15  (1):52-Q0  (February  1967). 

24.  Kalman,  Rudolph  E.  “A  New  Approach  to  Linear  Filtering  and  Prediction 
Problems,”  Transactions  of  the  ASME  Journal  of  Basic  Engineering,  82  (Series 
D): 35-45  (1960). 

25.  Kastclla,  Keith.  “A  Maximum  Likelihood  Estimator  for  Report-to- Track  As¬ 
sociation,”  SPIE  Signal  and  Data  Processing  of  Small  Targets,  1954: 386-393 
(October  1993). 

26.  Kastclla,  Keith.  “Comparison  of  Mean-Field  Tracker  and  Joint  Probabilistic 
Data  Association  Tracker  in  High-Clutter  Environments,”  SPIE  Signal  and  Data 
Processing  of  Small  Targets,  2561 :489-495  (September  1995). 

27.  Koch,  Wolfgang.  “Experimental  Results  on  Bayesian  MHT  for  Maneuvering 
Closely-Spaced  Objects  in  a  Densely  Cluttered  Environment,”  Radar  97  (Conf. 
Publ.  No.  449),  729-733  (October  1997). 


BIB-2 


28.  Koch,  Wolfgang  and  Gunter  van  Keuk.  “Multiple  Hypothesis  Track  Maintenance 
with  Possibly  Unresolved  Measurementes,”  IEEE  Transactions  on  Aerospace  and 
Electronic  Systems ,  883-892  (July  1997). 

29.  Kullback,  Solomon.  Information  Theory  and  Statistics  (Second  Edition).  Mine- 
ola,  NY:  Dover  Publications,  1997. 

30.  Kurien,  Thomas.  “Issues  in  the  Design  of  Practical  Multitarget  Tracking  Algo¬ 
rithms.”  Multitarget-Multisensor  Tracking:  Advanced  Applications .  43-83.  Nor¬ 
wood,  MA:  Artech-House,  1990. 

31.  Lainiotis,  D.G.  and  S.K.  Park.  “On  Joint  Detection,  Estimation  and  Sys¬ 
tem  Identification:  Discrete  Data  Case,”  International  Journal  of  Control , 
77(3): 609-633  (March  1973). 

32.  Leon-Garcia,  Alberto.  Probability  and  Random  Processes  for  Electrical  Engi¬ 
neering  (Second  Edition).  Reading,  MA:  Addison- Wesley,  1994. 

33.  MATLAB®  6.0  Online  Function  Reference.  Natick,  MA:  Mathworks,  Inc,  2001. 

34.  Maybeck,  Peter  S.  Stochastic  Models,  Estimation,  and  Control ,  Volume  1 .  Ar¬ 
lington,  VA:  Navtech,  1994. 

35.  Maybeck,  Peter  S.  Stochastic  Models,  Estimation,  and  Control ,  Volume  2.  Ar¬ 
lington,  VA:  Navtech,  1994. 

36.  Maybeck,  Peter  S.  “EE844  Computational  Aspects  of  Modern  Control.”  Lecture 
Notes.  Air  Force  Institute  of  Technology,  Wright-Patterson  Air  Force  Base,  OH, 
2002. 

37.  Maybeck,  Peter  S.  “EENG768  Multiple  Model  Adaptive  Estimation.”  Lecture 
Notes.  Air  Force  Institute  of  Technology,  Wright-Patterson  Air  Force  Base,  OH, 
2002. 

38.  Pao,  Lucy  Y.  “Multisensor  Multitarget  Mixture  Reduction  Algorithms  for 
Tracking,”  Journal  of  Guidance,  Control,  and  Dynamics ,  77(6):1205-1211 
(November-December  1994). 

39.  Poore,  A.B.  and  A.J.  Robertson.  “A  New  Lagrangian  Relaxation-Based  Algo¬ 
rithm  for  a  Class  of  Multidimensional  Assignment  Problems,”  Computational 
Optimization  and  Applications ,  8  (2):  129-150  (September  1997). 

40.  Reid,  Donald  B.  “An  Algorithm  for  Tracking  Multiple  Targets,”  IEEE  Trans¬ 
actions  on  Automatic  Control ,  A C-2f  (6): 843-854  (December  1979). 

41.  Reynolds,  D.A.  and  R.C.  Rose.  “Robust  Text-Independent  Speaker  Identification 
using  Gaussian  Mixture  Speaker  Models,”  IEEE  Transactions  Speech  and  Audio 
Processing ,  <?(l):72-83  (January  1995). 


BIB-3 


42.  Ristic,  Branko  and  Sanjeev  Arulampalam.  “Multitarget  Mixture  Reduction  Al¬ 
gorithm  with  Incorporated  Target  Existence  Recursions,”  SPIE  Signal  and  Data 
Processing  of  Small  Targets ,  f0f8: 366-377  (July  2000). 

43.  Roecker,  J.A.  “Multiple  Scan  Joint  Probabilistic  Data  Association,”  IEEE 
Transactions  Aerospace  and  Electronic  Systems ,  31: 1204-1210  (July  1995). 

44.  Salmond,  David  J.  Mixture  Reduction  Algorithms  for  Uncertain  Tracking.  Tech¬ 
nical  Report  88004,  Farnborough,  UK:  Royal  Aerospace  Establishment,  January 

1988.  DTIC  Number  ADA197641. 

45.  Salmond,  David  J.  “Mixture  Reduction  Algorithms  for  Target  Tracking.”  IEE 
Colloquium  on  State  Estimation  in  Aerospace  and  Tracking  Applications.  7/1- 
7/4.  London,  UK:  IEE  Publishing,  December  1989. 

46.  Salmond,  David  J.  Tracking  in  Uncertain  Environments.  Technical  Memoran¬ 
dum  AW  121,  Farnborough,  LIK:  Royal  Aerospace  Establishment,  September 

1989.  DTIC  Number  ADA215866.  Taken  from  a  D  Phil  thesis  of  the  University 
of  Sussex. 

47.  Salmond,  David  J.  “Mixture  Reduction  Algorithms  for  Target  Tracking  in  Clut¬ 
ter,”  SPIE  Signal  and  Data  Processing  of  Small  Targets ,  7505:434-44 5  (April 
1990). 

48.  Scharf,  Louis  L.  Statistical  Signal  Processing:  Detection,  Estimation  and  Time 
Series  Analysis.  Reading,  MA:  Addison- Wesley,  1991. 

49.  Singer,  R.A.,  et  al.  “Derivation  and  Evaluation  of  Improved  Tracking  Filters  for 
use  in  Dense  Multi-target  Environments,”  IEEE  Transactions  on  Information 
Theory ,  /T-50(4):423-832  (July  1974). 

50.  Skolnik,  Merrill  I.  Introduction  to  Radar  Systems  (Third  Edition).  New  York, 
NY:  McGraw-Hill,  2001. 

51.  Stimson,  George  W.  Introduction  to  Airborne  Radar  (Second  Edition).  Raleigh, 
NC:  Scitech  Publishing,  1998. 

52.  Strang,  G.  Linear  Algebra  and  its  Applications  (Third  Edition).  Orlando,  FL: 
Harcourt  College  Publishers,  1988. 

53.  Streit,  Roy  L.  and  Tod  E.  Luginbuhl.  Probabilistic  Multi- Hypothesis  Tracking. 
Technical  Report,  Newport,  RI:  Naval  Undersea  Warfare  Center  Division,  1995. 

54.  Tantaratana,  Sawasd.  “Some  Recent  Results  of  Sequential  Detection.”  Advances 
in  Statistical  Signal  Processing  Volume  2.  265-296.  Greenwich,  CT:  JAI  Press, 
1993. 

55.  Wark,  Timothy.  Multi-Modal  Speech  Processing  for  Automatic  Speaker  Recog¬ 
nition.  PhD  Thesis,  Queensland  University  of  Technology,  Brisbane,  Australia, 
2001. 


BIB-4 


56.  Weiss,  J.L.,  et  al.  “Finite  Computable  Filters  for  Linear  Systems  Subject  to 
Time  Varying  Model  Uncertainty.”  Proceedings  of  NAECON .  349-355.  Dayton, 
OH:  IEEE  Press,  May  1983. 

57.  Williams,  Jason  L.  and  Craig  Larson.  “EENG768  Multiple  Model  Adaptive 
Estimation  Project.”  Student  Project.  Air  Force  Institute  of  Technology,  Wright- 
Patterson  Air  Force  Base,  OH,  2002. 

58.  Wilson,  R.  “Image  Analysis  and  Segmentation  using  Mixture  Models.”  IEE 
Seminar  on  Time-scale  and  Time- Frequency  Analysis  and  Applications.  11/1— 
11/6.  London,  LIK:  IEE  Publishing,  February  2000. 


BIB-5 


Vita 


Flight  Lieutenant  Jason  L.  Williams  graduated  from  Queensland  University 
of  Technology  in  April  1999,  receiving  a  Bachelor  of  Engineering  (Electronics)  with 
First  Class  Honours,  and  a  Bachelor  of  Information  Technology  with  Distinction. 
He  joined  the  Royal  Australian  Air  Force  in  1996  through  the  undergraduate  spon¬ 
sorship  program.  His  first  assignment  was  at  the  Electronic  Warfare  Squadron  in 
Adelaide,  South  Australia,  where  he  received  the  E-Systems  Commander’s  Trophy 
for  Excellence  in  Electronic  Warfare.  In  2001  he  was  selected  to  study  the  Master  of 
Science  in  Electrical  Engineering  program  at  the  United  States  Air  Force  Institute 
of  Technology,  concentrating  on  Stochastic  Estimation  and  Control  and  Signal  Pro¬ 
cessing.  LIpon  graduation  he  will  be  assigned  to  the  Aircraft  Self  Protection  Systems 
Program  Office  in  Canberra,  Australia. 

Flight  Lieutenant  Williams  is  a  member  of  Eta  Kappa  Nn  and  Tan  Beta  Pi, 
as  well  as  the  Golden  Key  National  Honor  Society.  He  is  a  student  member  of  the 
Institute  of  Electrical  and  Electronic  Engineers. 


VITA-1 


REPORT  DOCUMENTATION  PAGE 

Form  Approved 

OMB  No.  074-0188 

The  public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  the  collection  of  information,  including 
suggestions  for  reducing  this  burden  to  Department  of  Defense,  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports  (0704-0188),  1215  Jefferson  Davis  Highway, 

Suite  1204,  Arlington,  VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of 
information  if  it  does  not  display  a  currently  valid  OMB  control  number. 

PLEASE  DO  NOT  RETURN  YOUR  FORM  TO  THE  ABOVE  ADDRESS. 

1.  REPORT  DATE  (DD-MM-YYYY) 
25-03-2003 

2.  REPORT  TYPE 

Master’s  Thesis 

3.  DATES  COVERED  (From  -  To) 

Jul  2002  -  Mar  2003 

4.  TITLE  AND  SUBTITLE 


5a.  CONTRACT  NUMBER 


GAUSSIAN  MIXTURE  REDUCTION  FOR  TRACKING  MULTIPLE 
MANEUVERING  TARGETS  IN  CLUTTER 


5b.  GRANT  NUMBER 


5c.  PROGRAM  ELEMENT  NUMBER 


6.  AUTHOR(S) 

Williams,  Jason  L.,  Flight  Lieutenant,  RAAF 


5d.  PROJECT  NUMBER 


5e.  TASK  NUMBER 


5f.  WORK  UNIT  NUMBER 


7.  PERFORMING  ORGANIZATION  NAMES(S)  AND  ADDRESS(S) 

Air  Force  Institute  of  Technology 

Graduate  School  of  Engineering  and  Management  (AFIT/EN) 
2950  Hobson  Way,  Building  640 
WPAFB  OH  45433-7765 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

AFRL/SNAT 

Attn:  Mr.  Stanton  H.  Musick 

2241  Avionics  Circle  DSN:  785-11 15,  ext  4292 

WPAFB  OH  45433-7765  e-mail:  Stanton.Musick@wpafb.af.mil 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

APPROVED  FOR  PUBLIC  RELEASE;  DISTRIBUTION  UNLIMITED. 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 

AFIT/GE/ENG/03- 1 9 


10.  SPONSOR/MONITOR’S  ACRONYM(S) 


11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 


14.  ABSTRACT 

The  problem  of  tracking  multiple  maneuvering  targets  in  clutter  naturally  leads  to  a  Gaussian  mixture  representation  of  the  Probability  Density 
Function  (PDF)  of  the  target  state  vector.  State-of-the-art  Multiple  Hypothesis  Tracking  (MHT)  techniques  maintain  the  mean,  covariance  and 
probability  weight  corresponding  to  each  hypothesis,  yet  they  rely  on  ad  hoc  merging  and  pruning  rules  to  control  the  growth  of  hypotheses.  This 
thesis  investigates  the  performance  benefit  achievable  by  applying  a  structured  cost  function-based  approach  to  the  hypothesis  control  problem. 

A  new  cost  function,  the  Integral  Square  Difference  (ISD)  cost,  is  proposed  for  measuring  the  difference  between  the  full  target  state  PDF  and 
a  reduced-order  approximation.  The  ISD  cost  function  is  physically  meaningful,  and,  unlike  any  previously  proposed  cost  function,  it  is  also 
mathematically  tractable,  requiring  neither  numerical  integration  nor  approximation  for  evaluation.  A  reduction  algorithm  is  proposed  which 
selects  components  for  merging  or  pruning  to  minimize  the  increase  in  the  ISD  cost.  This  solution  is  used  directly,  and  also  as  the  starting  point  for 
an  iterative  gradient-based  optimization. 

The  performance  of  the  ISD-based  algorithm  for  tracking  a  single  target  in  heavy  clutter  is  compared  to  that  of  Salmond’s  joining  filter,  which 
previously  had  provided  the  highest  performance  in  the  scenario  examined.  For  a  large  number  of  mixture  components,  it  is  shown  that  the  ISD 
algorithm  outperforms  the  joining  filter  remarkably,  yielding  an  average  track  life  more  than  double  that  achievable  using  the  joining  filter.  The 
results  indicate  that  the  tracking  performance  of  the  ISD-based  filter  in  heavy  clutter  is  significantly  higher  than  achievable  using  any  previously 
published  algorithm. 


15.  SUBJECT  TERMS 

Radar  tracking.  Search  radar.  Automatic  tracking.  Track  while  scan.  Radar  clutter.  Kalman  filtering,  Bayes’  theorem,  Probability  density 
functions.  Maximum  likelihood  estimation.  Optimization,  Statistical  distributions.  Stochastic  processes 


17.  LIMITATION  OF 

18.  NUMBER 

ABSTRACT 

OF 

PAGES 

UU 

247 

19a.  NAME  OF  RESPONSIBLE  PERSON 

Dr  Peter  S.  Maybeck 

19b.  TELEPHONE  NUMBER  (Include area  code) 
(937)  255-3636,  ext  4581;  e-mail:  Peter.Maybeck@afit.edu 


Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std.  Z39-18 


